# CanvasSync This gem is intended to facilitate fast and easy syncing of Canvas data. ## Installation Add this line to your application's Gemfile: ```ruby gem 'canvas_sync' ``` Models and migrations can be installed using the following generator: ``` bin/rails generate canvas_sync:install --models users,terms,courses ``` Use the `--models` option to specify what models you would like installed. This will add both the model files and their corresponding migrations. If you'd like to install all the models that `CanvasSync` supports then specify `--models all`. Then run the migrations: ``` bundle exec rake db:migrate ``` For a list of currently supported models, see `CanvasSync::SUPPORTED_MODELS`. Additionally, your Canvas instance must have the "Proserv Provisioning Report" enabled. The following custom reports are required for the specified models: - assignments = "Assignments Report" (proserv_assignment_export_csv) - submissions = "Student Submissions" (proserv_student_submissions_csv) - assignment_groups = "Assignment Group Export" (proserv_assignment_group_export_csv) - context_modules = "Professional Services Context Modules Report" (proserv_context_modules_csv) - context_module_items = "Professional Services Context Module Items Report" (proserv_context_module_items_csv) ## Prerequisites ### Postgres The bulk inserting is made possible by using a Postgres upsert. Beause of this, you need to be using **Postgres 9.5** or above. ### Sidekiq Make sure you've setup sidekiq to work properly with ActiveJob as [outlined here](https://github.com/mperham/sidekiq/wiki/Active-Job). ### Apartment If using apartment and sidekiq make sure you include the [apartment-sidekiq](https://github.com/influitive/apartment-sidekiq) gem so that the jobs are run in the correct tenant. ## Basic Usage Your tool must have an `ActiveJob` compatible job queue adapter configured, such as DelayedJob or Sidekiq. Additionally, you must have a method called `canvas_sync_client` defined in an initializer that returns a Bearcat client for the Canvas instance you are syncing against. Example: ```ruby # config/initializers/canvas_sync.rb def canvas_sync_client Bearcat::Client.new(token: current_organization.settings[:api_token], prefix: current_organization.settings[:base_url]) end ``` (Having the client defined here means the sensitive API token doesn't have to be passed in plain text between jobs.) Once that's done and you've used the generator to create your models and migrations you can run the standard provisioning sync: ```ruby CanvasSync.provisioning_sync(, term_scope: ) ``` *Note: pass in 'xlist' to your array of models if you would like sections to include cross listing information* Example: ```ruby CanvasSync.provisioning_sync(['users', 'courses'], term_scope: :active) ``` This will kick off a string of jobs to sync your specified models. If you pass in the optional `term_scope` the provisioning reports will be run for only the terms returned by that scope. The scope must be defined on your `Term` model. (A sample one is provided in the generated `Term`.) Imports are inserted in bulk with [activerecord-import](https://github.com/zdennis/activerecord-import) so they should be very fast. ## Advanced Usage This gem also helps with syncing and processing other reports if needed. In order to do so, you must: - Define a `Processor` class that implements a `process` method for handling the results of the report - Integrate your reports with the `ReportStarter` - Tell the gem what jobs to run ### `updated_after` An `updated_after` param may be passed when triggering a provision or making a chain: ```ruby CanvasSync.default_provisioning_report_chain( %i[list of models to sync], updated_after: false ) ``` It may be one of the following values: * `false` - Will not apply any `updated_after` filtering to the requested reports * An ISO-8601 Date - Will pass the supplied date ad the `updated_after` param for the requested reports * `true` (Default) - Will use the start date of the last successful sync ### Extensible chain It is sometimes desired to extend or customize the chain of jobs that are run with CanvasSync. This can be achieved with the following pattern: ```ruby chain = CanvasSync.default_provisioning_report_chain( %i[list of models to sync] ) # Add a custom job to the end of the chain. chain << { job: CanvasSyncCompleteWorker, parameters: [{ job_id: job.id }] } chain << { job: CanvasSyncCompleteWorker, options: { job_id: job.id } } # If an options key is provided, it will be automatically appended to the end of the :parameters array chain.process! # The chain object provides a fairly extensive API: chain.insert({ job: SomeOtherJob }) # Adds the job to the end of the chain chain.insert_at(0, { job: SomeOtherJob }) # Adds the job to the beginning of the chain chain.insert({ job: SomeOtherJob }, after: 'CanvasSync::Jobs::SyncTermsJob') # Adds the job right after the SyncTermsJob chain.insert({ job: SomeOtherJob }, before: 'CanvasSync::Jobs::SyncTermsJob') # Adds the job right before the SyncTermsJob chain.insert({ job: SomeOtherJob }, with: 'CanvasSync::Jobs::SyncTermsJob') # Adds the job to be performed concurrently with the SyncTermsJob # Some Jobs (such as the SyncTermsJob) have a sub-chain for, eg, Courses. # chain.insert is aware of these sub-chains and will recurse into them when looking for a before:/after:/with: reference chain.insert({ job: SomeOtherJob }, after: 'CanvasSync::Jobs::SyncCoursesJob') # Adds the job to be performed after SyncCoursesJob (which is a sub-job of the terms job and is duplicated for each term in the term_scope:) # You can also retrieve the sub-chain like so: chain.get_sub_chain('CanvasSync::Jobs::SyncTermsJob') ``` ### Processor Your processor class must implement a `process` class method that receives a `report_file_path` and a hash of `options`. (See the `CanvasSync::Processors::ProvisioningReportProcessor` for an example.) The gem handles the work of enqueueing and downloading the report and then passes the file path to your class to process as needed. A simple example might be: ```ruby class MyCoolProcessor def self.process(report_file_path, options) puts "I downloaded a report to #{report_file_path}! Isn't that neat!" end end ``` ### Report starter You must implement a job that will enqueue a report starter for your report. (TODO: would be nice to make some sort of builder for this, so you just define the report and its params and then the gem runs it in a pre-defined job.) Let's say we have a custom Canvas report called "my_really_cool_report_csv". First, we would need to create a job class that will enqueue a report starter. To work with the `CanvasSync` interface, your class must accept 2 parameters: `job_chain`, and `options`. ```ruby class MyReallyCoolReportJob < CanvasSync::Jobs::ReportStarter def perform(options) super( 'my_really_cool_report_csv', # Report name { "parameters[param1]" => true }, # Report parameters MyCoolProcessor.to_s, # Your processor class as a string options ) end end ``` You can also see examples in `lib/canvas_sync/jobs/sync_users_job.rb` and `lib/canvas_sync/jobs/sync_provisioning_report.rb`. ### Batching The provisioning report uses the `CanvasSync::Importers::BulkImporter` class to bulk import rows with the activerecord-import gem. It inserts rows in batches of 10,000 by default. This can be customized by setting the `BULK_IMPORTER_BATCH_SIZE` environment variable. ### Mapping Overrides Overrides are useful for two scenarios: - You have an existing application where the column names do not match up with what CanvasSync expects - You want to sync some other column in the report that CanvasSync is not configured to sync In order to create an override, place a file called `canvas_sync_provisioning_mapping.yml` in your Rails `config` directory. Define the tables and columns you want to override using the following format: ```ruby users: conflict_target: canvas_user_id # This must be a unique field that is present in the report and the database report_columns: # The keys specified here are the column names in the report CSV canvas_user_id_column_name_in_report: database_column_name: canvas_user_id_name_in_your_db # Sometimes the database column name might not match the report column name type: integer ``` ### API Sync Several models implement the `ApiSyncable` Concern. This is done in the Model Templates so as to be customizable and tweakable. Models that `include CanvasSync::Concerns::ApiSyncable` should also call the `api_syncable` class method to configure the Synchronization. `api_syncable` takes two arguments and an optional block callback: ```ruby class CanvasSyncModel < ApplicationRecord api_syncable( { local_field: :response_field, # api_response[:response_field] will be mapped to local_field on the model. local_field: -> (api_response) { api_response[:some_field] + 5 }, # A calculated result will be mapped to local_field on the model. The lambda is executed in the context of the model instance. }, -> (bearcat) { bearcat.some_request(some_model_getter) }, # A lambda, executed in the context of the model instance, to actually make the API call. Should accept 0 or 1 parameters. Must accept 0 parameters if your `canvas_sync_client` requires an `account_id` { # An optional options Hash mark_deleted: { workflow_state: 'deleted' }, # Action to take when a 404 is received from the API. May be a Hash that will be merged into the Model, A Symbol that should be sent to the model, or a lambda (both taking 0 arguments) } ) do |api_response, mapped_fields| # Must accept 1-2 parameters # Override behavior for actually applying the response to the model instance end def something() # ApiSyncable models add several instance methods: request_from_api( # Starts an API request and and returns the params retries: 3, # Number of times to retry the API call before failing ) update_from_api_params(params) # Merge the API response into the model instance update_from_api_params!(params) # Merge and save! if changed sync_from_api( # Starts an API request and calls save! (if changed) retries: 3, # Number of times to retry the API call before failing ) end end ``` ### Job Batching CanvasSync adds a `CanvasSync::JobBatches` module. It adds Sidekiq/sidekiq-batch like support for Job Batches. It integrates automatically with both Sidekiq and ActiveJob. The API is highly similar to the Sidekiq-batch implementation, documentation for which can be found at https://github.com/mperham/sidekiq/wiki/Batches A batch can be created using `Sidekiq::Batch` or `CanvasSync::JobBatching::Batch`. Also see `canvas_sync/jobs/begin_sync_chain_job`, `canvas_sync/Job_batches/jobs/serial_batch_job`, or `canvas_sync/Job_batches/jobs/concurrent_batch_job` for example usage. ## Legacy Support ### Legacy Mappings CanvasSync 0.10.0+, by default, changes Canvas primary-keys from `:canvas_MODEL_id` to just `:canvas_id`. Because CanvasSync primarily consists of templates, this change shouldn't require any large changes in your app, but you will need to apply the `model_mappings_legacy.yml` (located in the root of this repo) to your model mappings - see [Mapping Overrides](#mapping-overrides). ### Row-by-Row Syncing If you have an old style tool that needs to sync data on a row by row basis, you can pass in the `legacy_support: true` option. In order for this to work, your models must have a `create_or_update_from_csv` class method defined that accepts a row argument. This method will get passed each row from the CSV, and it's up to you to persist it. Example: ```ruby CanvasSync.provisioning_sync(['users', 'courses'], term_scope: :active, legacy_support: true) ``` You may also provide an array of model names. Doing so will only provide legacy support for the specified models. ```ruby CanvasSync.provisioning_sync(['users', 'courses'], term_scope: :active, legacy_support: ['courses']) ``` In the above example, users will sync normally while courses will require a `create_or_update_from_csv` method. ## CanvasSync::JobLog Running the migrations will create a `canvas_sync_job_logs` table. All the jobs written in this gem will create a `CanvasSync::JobLog` and store data about their arguments, job class, any exceptions, and start/completion time. This will work regardless of your queue adapter. If you want your own jobs to also log to the table all you have to do is have your job class inherit from `CanvasSync::Job`. You can also persist extra data you might need later by saving to the `metadata` column: ``` @job_log.metadata = "This job ran really well!" @job_log.save! ``` If you want to be able to utilize the `CanvasSync::JobLog` without `ActiveJob` (so you can get access to `Sidekiq` features that `ActiveJob` doesn't support), then add the following to an initializer in your Rails app: ``` Sidekiq.configure_server do |config| config.server_middleware do |chain| chain.add CanvasSync::Sidekiq::Middleware end end ``` ## Syncronize different reports CanvasSync provides the functionality to import data from other reports into an specific table. This can be achieved by using the following method ```ruby chain = CanvasSync.default_provisioning_report_chain chain << { job: CanvasSync::Jobs::SyncSimpleTableJob, options: { report_name: , model: , params: }, } chain.process! ``` ## Configuration You can configure CanvasSync settings by doing the following: ``` CanvasSync.configure do |config| config.classes_to_only_log_errors_on << "ClassToOnlyLogErrorsOn" end ``` Available config options (if you add more, please update this!): * `config.classes_to_only_log_errors_on` - use this if you are utilizing the `CanvasSync::JobLog` table, but want certain classes to only persist in the `job_logs` table if an error is encountered. This is useful if you've got a very frequently used job that's filling up your database, and only really care about tracking failures. ## Handling Job errors If you need custom handling for when a CanvasSync Job fails, you can add an `:on_failure` option to you Job Chain's `:global_options`. The value should be a String in the following format: `ModuleOrClass::AnotherModuleOrClass.class_method`. The given method of the given class will be called when an error occurs. The handling method should accept 2 arguments: `[error, **options]` The current parameters provided in `**options` are: - `job_chain` - `job_log` Example: ```ruby class CanvasSyncStarterWorker def perform job_chain = CanvasSync.default_provisioning_report_chain( %w[desired models], options: { global: { on_failure: 'CanvasSyncStarterWorker.handle_canvas_sync_error', } } ) end def self.handle_canvas_sync_error(error, **options) # Do Stuff end end ``` ## Upgrading Re-running the generator when there's been a gem change will give you several choices if it detects conflicts between your local files and the updated generators. You can either view a diff or allow the generator to overwrite your local file. In most cases you may just want to add the code from the diff yourself so as not to break any of your customizations. Additionally, if there have been schema changes to an existing model you may have to run your own migration to bring it up to speed. If you make updates to the gem please add any upgrade instructions here. ## Integrating with existing applications In order for this to work properly your database tables will need to have at least the columns defined in this gem. (Adding additional columns is fine.) As such, you may need to run some migrations to rename existing columns or add missing ones. The generator only works well in a situation where that table does not already exist. Take a look at the migration templates in `lib/canvas_sync/generators/templates` to see what you need. ## Development When adding to or updating this gem, make sure you do the following: - Update the yardoc comments where necessary, and confirm the changes by running `yardoc --server` - Write specs - If you modify the model or migration templates, run `bundle exec rake update_test_schema` to update them in the Rails Dummy application (and commit those changes) ## Docs Docs can be generated using [yard](https://yardoc.org/). To view the docs: - Clone this gem's repository - `bundle install` - `yard server --reload` The yard server will give you a URL you can visit to view the docs.