Sha256: 00adfbf5657907277d20e1962567441be4bae42a2c7dc0c51539e26fdddf6ccf
Contents?: true
Size: 1.26 KB
Versions: 3
Compression:
Stored size: 1.26 KB
Contents
# Crawler filter plugin for Embulk Write short description here and build.gradle file. ## Overview * **Plugin type**: filter ## Configuration - **target_key**: base_url column key name (string, require) - **max_depth_of_crawling**: max depth of crawling (integer, default: unlimited) - **number_of_crawlers**: parallelism (integer, default: 1) - **max_pages_to_fetch**: max_pages_to_fetch (integer, default: unlimited) - **crawl_storage_folder**: crawl_storage_folder (string, require) - **politeness_delay**: politeness_delay (integer, default: null) - **user_agent_string**: user_agent_string (string, default: null) - **output_prefix**: output_prefix (string, default: "") - **connection_timeout**: connection timeout millisecond (integer, default: 30000) - **socket_timeout**: socket timeout millisecond (integer, default: 20000) ## Example ```yaml in: type: mysql host: dbs04 user: application password: XXXXXXXX database: iap query: | select url from companies limit 100 filters: - type: crawler target_key: url number_of_crawlers: 10 max_depth_of_crawling: 4 politeness_delay: 100 crawl_storage_folder: "/tmp/crawl/%s" out: type: stdout ``` ## Build ``` $ ./gradlew gem # -t to watch change of files and rebuild continuously ```
Version data entries
3 entries across 3 versions & 1 rubygems
Version | Path |
---|---|
embulk-filter-crawler-0.1.3 | README.md |
embulk-filter-crawler-0.1.2 | README.md |
embulk-filter-crawler-0.1.1 | README.md |