|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.mapreduce.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>> org.apache.cassandra.hadoop.ColumnFamilyInputFormat
public class ColumnFamilyInputFormat
Hadoop InputFormat allowing map/reduce against Cassandra rows within one ColumnFamily. At minimum, you need to set the CF and predicate (description of columns to extract from each row) in your Hadoop job Configuration. The ConfigHelper class is provided to make this simple: ConfigHelper.setColumnFamily ConfigHelper.setSlicePredicate You can also configure the number of rows per InputSplit with ConfigHelper.setInputSplitSize This should be "as big as possible, but no bigger." Each InputSplit is read from Cassandra with multiple get_slice_range queries, and the per-call overhead of get_slice_range is high, so larger split sizes are better -- but if it is too large, you will run out of memory. The default split size is 64k rows.
Constructor Summary | |
---|---|
ColumnFamilyInputFormat()
|
Method Summary | |
---|---|
org.apache.hadoop.mapreduce.RecordReader<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext)
Create a record reader for a given split. |
java.util.List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Logically split the set of input files for the job. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ColumnFamilyInputFormat()
Method Detail |
---|
public java.util.List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws java.io.IOException
org.apache.hadoop.mapreduce.InputFormat
Each InputSplit
is then assigned to an individual Mapper
for processing.
Note: The split is a logical split of the inputs and the
input files are not physically split into chunks. For e.g. a split could
be <input-file-path, start, offset> tuple. The InputFormat
also creates the RecordReader
to read the InputSplit
.
getSplits
in class org.apache.hadoop.mapreduce.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
context
- job configuration.
InputSplit
s for the job.
java.io.IOException
public org.apache.hadoop.mapreduce.RecordReader<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>> createRecordReader(org.apache.hadoop.mapreduce.InputSplit inputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext taskAttemptContext) throws java.io.IOException, java.lang.InterruptedException
org.apache.hadoop.mapreduce.InputFormat
RecordReader.initialize(InputSplit, TaskAttemptContext)
before
the split is used.
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<java.nio.ByteBuffer,java.util.SortedMap<java.nio.ByteBuffer,IColumn>>
inputSplit
- the split to be readtaskAttemptContext
- the information about the task
java.io.IOException
java.lang.InterruptedException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |