# InstDataShipper This gem is intended to facilitate easy upload of LTI datasets to Instructure Hosted Data. ## Installation Add this line to your application's Gemfile: ```ruby gem 'inst_data_shipper' ``` Then run the migrations: ``` bundle exec rake db:migrate ``` ## Usage ### Dumper The main tool provided by this Gem is the `InstDataDumper::Dumper` class. It is used to define a "Dump" which is a combination of tasks and schema. Here is an example `Dumper` implementation, wrapped in an ActiveJob job: ```ruby class HostedDataPushJob < ApplicationJob # The schema serves two purposes: defining the schema and mapping data SCHEMA = InstDataShipper::SchemaBuilder.build do # You can augment the Table-builder DSL with custom methods like so: extend_table_builder do # It may be useful to define a custom column definition helpers: def custom_column(*args, from: nil, **kwargs, &blk) # In this example, the helper reads the value from a `data` jsonb column - without it, you'd need # to define `from: ->(row) { row.data[""] }` on each column that needs to read from the jsonb from ||= args[0].to_s from = ->(row) { row.data[from] } if from.is_a?(String) column(*args, **kwargs, from: from, &blk) end # `extend_table_builder` uses `class_eval`, so you could alternatively write your helpers in a Concern or Module and include them like normal: include SomeConcern end table(ALocalModel, "") do # If you define a table as incremental, it'll only export changes made since the start of the last successful Dumper run # The first argument "scope" can be interpreted in different ways: # If exporting a local model it may be a: (default: `updated_at`) # Proc that will receive a Relation and return a Relation (use `incremental_since`) # String of a column to compare with `incremental_since` # If exporting a Canvas report it may be a: (default: `updated_after`) # Proc that will receive report params and return modified report params (use `incremental_since`) # String of a report param to set to `incremental_since` # `on:` is passed to Hosted Data and is used as the unique key. It may be an array to form a composite-key # `if:` may be a Proc or a Symbol (of a method on the Dumper) incremental "updated_at", on: [:id], if: ->() {} # Schema's may declaratively define the data source. # This can be used for basic schemas where there's a 1:1 mapping between source table and destination table, and there is no conditional logic that needs to be performed. # In order to apply these statements, your Dumper must call `auto_enqueue_from_schema`. source :local_table # A Proc can also be passed. The below is equivalent to the above source ->(table_def) { import_local_table(table_def[:model] || table_def[:warehouse_name]) } column :name_in_destinations, :maybe_optional_sql_type, "Optional description of column" # The type may usually be omitted if the `table()` is passed a Model class, but strings are an exception to this custom_column :name, :"varchar(128)" # `from:` May be... # A Symbol of a method to be called on the record custom_column :sis_type, :"varchar(32)", from: :some_model_method # A String of a column to read from the record custom_column :sis_type, :"varchar(32)", from: "sis_source_type" # A Proc to be called with each record custom_column :sis_type, :"varchar(32)", from: ->(rec) { ... } # Not specified. Will default to using the Schema Column Name as a String ("sis_type" in this case) custom_column :sis_type, :"varchar(32)" end table("my_table", model: ALocalModel) do # ... end table("proserv_student_submissions_csv") do column :canvas_id, :bigint, from: "canvas user id" column :sis_id, :"varchar(64)", from: "sis user id" column :name, :"varchar(64)", from: "user name" column :submission_id, :bigint, from: "submission id" end end Dumper = InstDataShipper::Dumper.define(schema: SCHEMA, include: [ InstDataShipper::DataSources::LocalTables, InstDataShipper::DataSources::CanvasReports, ]) do import_local_table(ALocalModel) import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id)) # If the report_name/Model don't directly match the Schema, a schema_name: parameter may be passed: import_local_table(SomeModel, schema_name: "my_table") import_canvas_report_by_terms("some_report", terms: Term.all.pluck(:canvas_id), schema_name: "my_table") # Iterate through the Tables defined in the Schema and apply any defined `source` statements. # This is the default behavior if `define()` is called w/o a block. auto_enqueue_from_schema end def perform Dumper.perform_dump([ "hosted-data://@?table_prefix=example", "s3://:@//", ]) end end ``` `Dumper`s may also be formed as a normal Ruby subclass: ```ruby class HostedDataPushJob < ApplicationJob SCHEMA = InstDataShipper::SchemaBuilder.build do # ... end class Dumper < InstDataShipper::Dumper include InstDataShipper::DataSources::LocalTables include InstDataShipper::DataSources::CanvasReports def enqueue_tasks import_local_table(ALocalModel) import_canvas_report_by_terms("proserv_student_submissions_csv", terms: Term.all.pluck(:canvas_id)) # auto_enqueue_from_schema end def table_schemas SCHEMA end end def perform Dumper.perform_dump([ "hosted-data://@?table_prefix=example", "s3://:@//", ]) end end ``` ### Destinations This Gem is mainly designed for use with Hosted Data, but it tries to abstract that a little to allow for other destinations/backends. Out of the box, support for Hosted Data and S3 are included. Destinations are passed as URI-formatted strings. Passing Hashes is also supported, but the format/keys are destination specific. Destinations blindly accept URI Fragments (the `#` chunk at the end of the URI). These options are not used internally but will be made available as `dest.user_config`. Ideally these are in the same format as query parameters (`x=1&y=2`, which it will try to parse into a Hash), but it can be any string. #### Hosted Data `hosted-data://@` ##### Optional Parameters: - `table_prefix`: An optional string to prefix onto each table name in the schema when declaring the schema in Hosted Data #### S3 `s3://:@//` ##### Optional Parameters: _None_ ## Development When adding to or updating this gem, make sure you do the following: - Update the yardoc comments where necessary, and confirm the changes by running `yardoc --server` - Write specs - If you modify the model or migration templates, run `bundle exec rake update_test_schema` to update them in the Rails Dummy application (and commit those changes) ## Docs Docs can be generated using [yard](https://yardoc.org/). To view the docs: - Clone this gem's repository - `bundle install` - `yard server --reload` The yard server will give you a URL you can visit to view the docs.