.\" Generated by kramdown-man 0.1.8 .\" https://github.com/postmodern/kramdown-man#readme .TH ronin-web-spider 1 "2022-01-01" Ronin Web "User Manuals" .LP .SH SYNOPSIS .LP .HP \fBronin-web-spider\fR \[lB]\fIoptions\fP\[rB] \[lC]\fB--host\fR \fIHOST\fP \[or] \fB--domain\fR \fIDOMAIN\fP \[or] \fB--site\fR \fIURL\fP\[rC] .LP .SH DESCRIPTION .LP .PP Spiders a website\. .LP .SH OPTIONS .LP .TP \fB--open-timeout\fR \fISECS\fP Sets the connection open timeout\. .LP .TP \fB--read-timeout\fR \fISECS\fP Sets the read timeout\. .LP .TP \fB--ssl-timeout\fR \fISECS\fP Sets the SSL connection timeout\. .LP .TP \fB--continue-timeout\fR \fISECS\fP Sets the continue timeout\. .LP .TP \fB--keep-alive-timeout\fR \fISECS\fP Sets the connection keep alive timeout\. .LP .TP \fB-P\fR, \fB--proxy\fR \fIPROXY\fP Sets the proxy to use\. .LP .TP \fB-H\fR, \fB--header\fR \[lq]\fINAME\fP: \fIVALUE\fP\[rq] Sets a default header\. .LP .TP \fB--host-header\fR \fINAME\fP\[eq]\fIVALUE\fP Sets a default header\. .LP .HP \fB-u\fR, \fB--user-agent\fR chrome\-linux\[or]chrome\-macos\[or]chrome\-windows\[or]chrome\-iphone\[or]chrome\-ipad\[or]chrome\-android\[or]firefox\-linux\[or]firefox\-macos\[or]firefox\-windows\[or]firefox\-iphone\[or]firefox\-ipad\[or]firefox\-android\[or]safari\-macos\[or]safari\-iphone\[or]safari\-ipad\[or]edge The \fBUser-Agent\fR to use\. .LP .TP \fB-U\fR, \fB--user-agent-string\fR \fISTRING\fP The raw \fBUser-Agent\fR string to use\. .LP .TP \fB-R\fR, \fB--referer\fR \fIURL\fP Sets the \fBReferer\fR URL\. .LP .TP \fB--delay\fR \fISECS\fP Sets the delay in seconds between each request\. .LP .TP \fB-l\fR, \fB--limit\fR \fICOUNT\fP Only spiders up to \fICOUNT\fP pages\. .LP .TP \fB-d\fR, \fB--max-depth\fR \fIDEPTH\fP Only spiders up to max depth\. .LP .TP \fB--enqueue\fR \fIURL\fP Adds the URL to the queue\. .LP .TP \fB--visited\fR \fIURL\fP Marks the URL as previously visited\. .LP .TP \fB--strip-fragments\fR Enables\[sl]disables stripping the fragment component of every URL\. .LP .TP \fB--strip-query\fR Enables\[sl]disables stripping the query component of every URL\. .LP .TP \fB--visit-host\fR \fIHOST\fP Visit URLs with the matching host name\. .LP .HP \fB--visit-hosts-like\fR \[sl]\fIREGEX\fP\[sl] Visit URLs with hostnames that match the \fIREGEX\fP\. .LP .TP \fB--ignore-host\fR \fIHOST\fP Ignore the host name\. .LP .HP \fB--ignore-hosts-like\fR \[sl]\fIREGEX\fP\[sl] Ignore the host names matching the \fIREGEX\fP\. .LP .TP \fB--visit-port\fR \fIPORT\fP Visit URLs with the matching port number\. .LP .HP \fB--visit-ports-like\fR \[sl]\fIREGEX\fP\[sl] Visit URLs with port numbers that match the \fIREGEX\fP\. .LP .TP \fB--ignore-port\fR \fIPORT\fP Ignore the port number\. .LP .HP \fB--ignore-ports-like\fR \[sl]\fIREGEX\fP\[sl] Ignore the port numbers matching the \fIREGEXP\fP\. .LP .TP \fB--visit-link\fR \fIURL\fP Visit the \fIURL\fP\. .LP .HP \fB--visit-links-like\fR \[sl]\fIREGEX\fP\[sl] Visit URLs that match the \fIREGEX\fP\. .LP .TP \fB--ignore-link\fR \fIURL\fP Ignore the \fIURL\fP\. .LP .HP \fB--ignore-links-like\fR \[sl]\fIREGEX\fP\[sl] Ignore URLs matching the \fIREGEX\fP\. .LP .TP \fB--visit-ext\fR \fIFILE\[ru]EXT\fP Visit URLs with the matching file ext\. .LP .HP \fB--visit-exts-like\fR \[sl]\fIREGEX\fP\[sl] Visit URLs with file exts that match the \fIREGEX\fP\. .LP .TP \fB--ignore-ext\fR \fIFILE\[ru]EXT\fP Ignore the URLs with the file ext\. .LP .HP \fB--ignore-exts-like\fR \[sl]\fIREGEX\fP\[sl] Ignore URLs with file exts matching the REGEX\. .LP .TP \fB-r\fR, \fB--robots\fR Specifies whether to honor \fBrobots.txt\fR\. .LP .TP \fB--host\fR \fIHOST\fP Spiders the specific \fIHOST\fP\. .LP .TP \fB--domain\fR \fIDOMAIN\fP Spiders the whole \fIDOMAIN\fP\. .LP .TP \fB--site\fR \fIURL\fP Spiders the website, starting at the \fIURL\fP\. .LP .TP \fB--print-status\fR Print the status codes for each URL\. .LP .TP \fB--print-headers\fR Print response headers for each URL\. .LP .TP \fB--print-header\fR \fINAME\fP Prints a specific header\. .LP .TP \fB--history\fR \fIFILE\fP Sets the history file to write every visited URL to\. .LP .TP \fB--archive\fR \fIDIR\fP Archive every visited page to the \fIDIR\fP\. .LP .TP \fB--git-archive\fR \fIDIR\fP Archive every visited page to the git repository\. .LP .TP \fB-X\fR, \fB--xpath\fR \fIXPATH\fP Evaluates the XPath on each HTML page\. .LP .TP \fB-C\fR, \fB--css-path\fR \fIXPATH\fP Evaluates the CSS\-path on each HTML page\. .LP .TP \fB-v\fR, \fB--verbose\fR Enables verbose output\. .LP .TP \fB-h\fR, \fB--help\fR Print help information\. .LP .SH ENVIRONMENT .LP .TP \fIHTTP\[ru]PROXY\fP Sets the global HTTP proxy\. .LP .TP \fIRONIN\[ru]HTTP\[ru]PROXY\fP Sets the HTTP proxy for Ronin\. .LP .SH AUTHOR .LP .PP Postmodern .MT postmodern\.mod3\[at]gmail\.com .ME .LP .SH SEE ALSO .LP .PP ronin\-web\-server(1) ronin\-web\-proxy(1) ronin\-web\-diff(1) ronin\-web\-new\-spider(1)