= ganapati -- Hadoop HDFS Thrift interface for Ruby Ganapati is a Ruby thrift lib for interfacing with Hadoop's distributed file system, HDFS. It also includes a few command line client utilities. To install: gem install ganapati == Starting thrift server Documentation in Hadoop for the thrift interface to HDFS is crap. It can be found here[http://wiki.apache.org/hadoop/HDFS-APIs]. As a much simpler and safer way of auto compiling and then starting the thrift interface, use the provided script: bin/hdfs_thrift_server This will start a thrift server on the given port (after compiling the server code provided in the Hadoop distribution). == Basic Usage require 'rubygems' require 'ganapati' # args are host, port, and optional timeout client = Ganapati::Client.new 'localhost', 1234 # copy a file to hdfs client.put("/some/file", "/some/hadoop/path") # get a file from hadoop client.get("/some/hadoop/path", "/local/path") # Create a file f = client.create("/home/someuser/afile.txt") f.write("this is some text") # Always, always close the file f.close # Create a file with code block client.create("/home/someuser/afile.txt") { |f| f.write("this is some text") } # Open a file for reading and read it client.open('/home/someuser/afile.txt') { |f| puts f.read # or read for specific length from start puts f.read(0, 4) } # read a file line by line client.readlines('/home/someuser/afile.txt') { |line| puts line } # Open a file for appending and append to it client.append('/home/someuser/afile.txt') { |f| f.write "new data" } ## Common file methods are available (chown, chmod, mkdir, stat, etc). Examples: # move a file client.mv "/home/someuser/afile.txt", "/home/someuser/test.txt" # remove a file client.rm "/home/someuser/test.txt" # test for file existance client.exists? "/home/someuser/test.txt" # get a list of all files client.ls "/home" client.close # Quick and dirty way to print remote file. The run class method takes care of closing the client. puts Ganapati::Client.run('localhost', 1234) { |c| c.open('/home/someuser/afile.txt') { |f| f.read } } == Command Line Utilities There are a few utility programs included in the bin directory. *hls* provides a way to see the contents of HDFS (recursively and verbosely with appropriate command line options): ./bin/hls hdfs://host:port/tmp *hcp* provides a way to copy to/from/between HDFS servers: ./bin/hcp hdfs://host:port/some/path/to/file ./file ./bin/hcp ./file hdfs://host:port/some/path/to/file ./bin/hcp hdfs://anotherhost:port/some/path/to/file hdfs://host:port/some/path/to/file