README.rdoc

Path: README.rdoc
Last Update: Thu Mar 31 11:09:15 -0400 2011

ganapati — Hadoop HDFS Thrift interface for Ruby

Ganapati is a Ruby thrift lib for interfacing with Hadoop‘s distributed file system, HDFS. It also includes a few command line client utilities.

To install:

  gem install ganapati

Starting thrift server

Documentation in Hadoop for the thrift interface to HDFS is crap. It can be found here.

As a much simpler and safer way of auto compiling and then starting the thrift interface, use the provided script:

  bin/hdfs_thrift_server <port>

This will start a thrift server on the given port (after compiling the server code provided in the Hadoop distribution).

Basic Usage

  require 'rubygems'
  require 'ganapati'

  # args are host, port, and optional timeout
  client = Ganapati::Client.new 'localhost', 1234

  # copy a file to hdfs
  client.put("/some/file", "/some/hadoop/path")

  # get a file from hadoop
  client.get("/some/hadoop/path", "/local/path")

  # Create a file
  f = client.create("/home/someuser/afile.txt")
  f.write("this is some text")
  # Always, always close the file
  f.close

  # Create a file with code block
  client.create("/home/someuser/afile.txt") { |f|
    f.write("this is some text")
  }

  # Open a file for reading and read it
  client.open('/home/someuser/afile.txt') { |f|
    puts f.read
    # or read for specific length from start
    puts f.read(0, 4)
  }

  # read a file line by line
  client.readlines('/home/someuser/afile.txt') { |line|
    puts line
  }

  # Open a file for appending and append to it
  client.append('/home/someuser/afile.txt') { |f|
    f.write "new data"
  }

  ## Common file methods are available (chown, chmod, mkdir, stat, etc).  Examples:
  # move a file
  client.mv "/home/someuser/afile.txt", "/home/someuser/test.txt"

  # remove a file
  client.rm "/home/someuser/test.txt"

  # test for file existance
  client.exists? "/home/someuser/test.txt"

  # get a list of all files
  client.ls "/home"

  client.close

  # Quick and dirty way to print remote file.  The run class method takes care of closing the client.
  puts Ganapati::Client.run('localhost', 1234) { |c| c.open('/home/someuser/afile.txt') { |f| f.read } }

Command Line Utilities

There are a few utility programs included in the bin directory. hls provides a way to see the contents of HDFS (recursively and verbosely with appropriate command line options):

  ./bin/hls hdfs://host:port/tmp

hcp provides a way to copy to/from/between HDFS servers:

  ./bin/hcp hdfs://host:port/some/path/to/file ./file
  ./bin/hcp ./file hdfs://host:port/some/path/to/file
  ./bin/hcp hdfs://anotherhost:port/some/path/to/file hdfs://host:port/some/path/to/file

[Validate]