# Databricks Gem ## Overview This gem is designed to allow access to the DBX APIs (Jobs and SQL) from ruby applications. ## Installation Add the following to your Gemfile to install ```ruby gem 'dbx-api', '~>0.2.0' ``` ## Usage Set up your .env file (optional) ```bash # .env DBX_HOST=DBX_CONNECTION_URL DBX_TOKEN=YOUR_TOKEN_HERE DBX_WAREHOUSE_ID=WAREHOUSE_ID_HERE ``` running sql from a ruby script ```ruby require 'dbx' # If using a .env file sql_runner = DatabricksGateway.new # If not using .env file... sql_runner = DatabricksGateway.new(host: 'DBX_CONNECTION_STRING', token: 'DBX_ACCESS_TOKEN', warehouse: 'DBX_SQL_WAREHOUSE_ID') # Basic sql response = sql_runner.run_sql("SELECT 1") response.results # => [{"1"=>"1"}] # Dummy data in public DBX table response = sql_runner.run_sql("SELECT * FROM samples.nyctaxi.trips LIMIT 1") response.results # => [{"tpep_pickup_datetime"=>"2016-02-14T16:52:13.000Z", # "tpep_dropoff_datetime"=>"2016-02-14T17:16:04.000Z", # "trip_distance"=>"4.94", # "fare_amount"=>"19.0", # "pickup_zip"=>"10282", # "dropoff_zip"=>"10171"}] ``` `run_sql` returns an object of type DatabricksSQLResponse. The response object has a few useful methods. For a complete list, see the class definition: `lib/dbx/databricks/sql_response.rb` ```ruby response = sql_runner.run_sql("SELECT 1") # checking the status of a response response.status # => SUCCEEDED | FAILED | PENDING | RUNNING response.failed? # => Boolean response.success? # => Boolean # getting the results of a response response.results # => Array of Hashes # looking at the raw response response.raw_response # => HTTP object # or just the parsed body of the HTTP response response.body # checking error messages for failed responses response.error_message # => String ``` This gem does not make an inference to how error handling should occur. `run_sql` always returns an array, even if the query fails (it will return `[]` if status.failed?). Users may wish to check the status of the response before attempting to access the results. For example: ```ruby require 'dbx' sql_runner = DatabricksGateway.new res = sql_runner.run_sql("SELECT 1") # do something with the results if the query succeeded return res.results if res.success? # do something else if the query failed puts "query failed: #{res.error_message}" ``` Since `run_sql` returns an instance of `DatabricksSQLResponse`, you can also chain methods together: ```ruby sql_runner.run_sql("SELECT 1").results ``` ## Development - After checking out the repo, run `bin/setup` to install dependencies. - Set up your `.env` file as described above. - Run `rake spec` to run the rspec tests. ## Build - Run `gem build dbx.gemspec ` to build the gem. - Run `gem push dbx-api-0.2.0.gem` to push the gem to rubygems.org - Requires logging in to rubygems.org first via `gem login`