Sha256: e0d96e7301ede1333320c67d0f6eecb5496c569d8206eaef01b6caf1b455c23a

Contents?: true

Size: 1.72 KB

Versions: 2

Compression:

Stored size: 1.72 KB

Contents

# Navigation

Wayfarer has two mechanisms for navigating crawls:

* Jobs have a router that decides if a task's URL gets fetched and processed.
* Jobs can add URLs to a processing set with `#stage`.

## Staging URLs

Jobs can turn URLs into tasks within their own batch with `#stage`. Staging a
URL does not enqueue it immediately. Instead, the URL is added to a processing
set first.

```ruby
class DummyJob < Wayfarer::Base
  route { to :index }

  def index
    stage page.meta.links.all
  end
end
```

Once the `index` action method returns, all URLs in `page.meta.links.all`
are (1) normalized to a canonical form and (2) checked for inclusion in
the batch's processed URL Redis set. All unprocessed URLs are enqueued as
tasks within the same batch.

`#stage` can be called arbitrarily often, with invalid URLs too, as they are
filtered out behind the scenes:

```ruby
def index
  stage "_bro:ken@url/" # => ["_bro:ken@url/"]
end
```

See also: [Performance: Stage less URLs](/guides/performance)

!!! attention "Failing action methods do not enqueue tasks"

    If an action method fails as in:

    ```ruby
    def index
      stage page.meta.links.all
      fail "Error occured"
    end
    ```

    None of the staged URLs are enqueued as tasks. Jobs that raise an exception
    should get retried, or the exception should be handled.


## Routing URLs

In the following example, the task is written to the message queue, but the
job's routes do not match the URL. When the task gets consumed, the URL does not
get fetched and the action method not called.

```ruby
class DummyJob < Wayfarer::Base
  route do
    host "example.com", path: "/users/:user_id", to: :user
  end

  # ...
end

DummyJob.crawl_later("https://mismatching.host/users/42")
```


Version data entries

2 entries across 2 versions & 1 rubygems

Version Path
wayfarer-0.4.3 docs/guides/navigation.md
wayfarer-0.4.2 docs/guides/navigation.md