README.md in elasticity-6.0.10 vs README.md in elasticity-6.0.11
- old
+ new
@@ -139,11 +139,11 @@
### EMR Applications (optional needs release_label >= 4.0.0)
With the release of EMR 4.0.0 you can now supply applications which EMR will install for you on boot(rather than a manual bootstrap action. Which you can still use if required). You must set the `release_label` for the jobflow(>=4.0.0)
```ruby
-jobflow.release_label = '4.3.0'
+jobflow.release_label = '4.3.0'
# the simple way
jobflow.add_application("Spark") # Pig, Hive, Mahout
# more verbose
spark = Elasticity::Application.new({
name: 'Spark',
@@ -339,13 +339,47 @@
jobflow.add_step(copy_step)
# For AMI < 4.x you need to specifify legacy argument
copy_step = Elasticity::S3DistCpStep.new(true)
+```
+### Adding a Scalding Step
+
+```ruby
+scalding_step = Elasticity::ScaldingStep.new('jar_location', 'main_class_fqcn', { 'arg1' => 'value1' })
+
+jobflow.add_step(scalding_step)
```
+This will result in the following command line arguments:
+
+```bash
+main_class_fqcn --hdfs --arg1 value1
+```
+
+### Adding a Spark Step
+
+```ruby
+spark_step = Elasticity::SparkStep.new('jar_location', 'main_class_fqcn')
+
+# Specifying arguments relative to Spark
+spark_step.spark_arguments = { 'driver-memory' => '2G' }
+# Specifying arguments relative to your application
+spark_step.app_arguments = { 'arg1' => 'value1' }
+```
+
+This will be equivalent to the following script:
+
+```bash
+spark-submit \
+ --driver-memory 2G \
+ --class main_class_fqcn \
+ jar_location \
+ --arg1 value1
+```
+
## 7 - Upload Assets (optional)
This isn't part of ```JobFlow```; more of an aside. Elasticity provides a very basic means of uploading assets to S3 so that your EMR job has access to them. Most commonly this will be a set of resources to run the job (e.g. JAR files, streaming scripts, etc.) and a set of resources used by the job itself (e.g. a TSV file with a range of valid values, join tables, etc.).
```ruby
@@ -420,10 +454,10 @@
Elasticity.configure do |config|
# AWS credentials
config.access_key = ENV['AWS_ACCESS_KEY_ID']
config.secret_key = ENV['AWS_SECRET_ACCESS_KEY']
-
+
# if you use federated Identity Management
#config.security_token = ENV['AWS_SECURITY_TOKEN']
# If using Hive, it will be configured via the directives here
config.hive_site = 's3://bucket/hive-site.xml'