Sha256: 9f9e7403e62675f7ff2abdd7dbe2acd0055816bd4d52d9dbc25f71206ddb9f06

Contents?: true

Size: 1.11 KB

Versions: 2

Compression:

Stored size: 1.11 KB

Contents

h2. Hadoop on EC2

* http://www.cloudera.com/hadoop-ec2
* http://www.cloudera.com/hadoop-ec2-ebs-beta


h3. Setup NFS within the cluster

* 
* http://nfs.sourceforge.net/nfs-howto/ar01s03.html


h3. Miscellaneous Hadoop Tips

* The Cloudera AMIs and distribution include BZip2 support.  This means that if you have input files with a .bz2 extension, they will be naturally un-bzipped and streamed. (Note that there is a non-trivial penalty for doing so: each bzip'ed file must go, in whole, to a single mapper; and the CPU load for un-bzipping is sizeable.)

* To _produce_ bzip2 files, specify the new @--compress_output=@ flag.  If you have the BZip2 patches installed, you can give @--compress_output=bz2@; everyone should be able to use @--compress_output=gz@.

* For excellent performance you can patch your install for "Parallel LZO Splitting":http://www.cloudera.com/blog/2009/06/24/parallel-lzo-splittable-compression-for-hadoop/


h3. Tools for EC2 and S3 Management

* http://s3sync.net/wiki
* http://jets3t.s3.amazonaws.com/applications/applications.html#uploader
* "ElasticFox"
* "S3Fox (S3 Organizer)":
* "FoxyProxy":

Version data entries

2 entries across 2 versions & 1 rubygems

Version Path
wukong-0.1.4 doc/hadoop-setup.textile
wukong-0.1.1 doc/hadoop-setup.textile