README.md in elasticity-6.0.10 vs README.md in elasticity-6.0.11

- old
+ new

@@ -139,11 +139,11 @@ ### EMR Applications (optional needs release_label >= 4.0.0) With the release of EMR 4.0.0 you can now supply applications which EMR will install for you on boot(rather than a manual bootstrap action. Which you can still use if required). You must set the `release_label` for the jobflow(>=4.0.0) ```ruby -jobflow.release_label = '4.3.0' +jobflow.release_label = '4.3.0' # the simple way jobflow.add_application("Spark") # Pig, Hive, Mahout # more verbose spark = Elasticity::Application.new({ name: 'Spark', @@ -339,13 +339,47 @@ jobflow.add_step(copy_step) # For AMI < 4.x you need to specifify legacy argument copy_step = Elasticity::S3DistCpStep.new(true) +``` +### Adding a Scalding Step + +```ruby +scalding_step = Elasticity::ScaldingStep.new('jar_location', 'main_class_fqcn', { 'arg1' => 'value1' }) + +jobflow.add_step(scalding_step) ``` +This will result in the following command line arguments: + +```bash +main_class_fqcn --hdfs --arg1 value1 +``` + +### Adding a Spark Step + +```ruby +spark_step = Elasticity::SparkStep.new('jar_location', 'main_class_fqcn') + +# Specifying arguments relative to Spark +spark_step.spark_arguments = { 'driver-memory' => '2G' } +# Specifying arguments relative to your application +spark_step.app_arguments = { 'arg1' => 'value1' } +``` + +This will be equivalent to the following script: + +```bash +spark-submit \ + --driver-memory 2G \ + --class main_class_fqcn \ + jar_location \ + --arg1 value1 +``` + ## 7 - Upload Assets (optional) This isn't part of ```JobFlow```; more of an aside. Elasticity provides a very basic means of uploading assets to S3 so that your EMR job has access to them. Most commonly this will be a set of resources to run the job (e.g. JAR files, streaming scripts, etc.) and a set of resources used by the job itself (e.g. a TSV file with a range of valid values, join tables, etc.). ```ruby @@ -420,10 +454,10 @@ Elasticity.configure do |config| # AWS credentials config.access_key = ENV['AWS_ACCESS_KEY_ID'] config.secret_key = ENV['AWS_SECRET_ACCESS_KEY'] - + # if you use federated Identity Management #config.security_token = ENV['AWS_SECURITY_TOKEN'] # If using Hive, it will be configured via the directives here config.hive_site = 's3://bucket/hive-site.xml'