.\" generated with Ronn/v0.7.3 .\" http://github.com/rtomayko/ronn/tree/0.7.3 . .TH "MAF_EXTRACT" "1" "August 2012" "BioRuby" "BioRuby Manual" . .SH "NAME" \fBmaf_extract\fR \- extract blocks from MAF files . .SH "SYNOPSIS" \fBmaf_extract\fR \-m MAF [\-i INDEX] \-\-interval SEQ:START\-END \fIOPTIONS\fR . .P \fBmaf_extract\fR \-m MAF [\-i INDEX] \-\-bed BED \fIOPTIONS\fR . .P \fBmaf_extract\fR \-d MAFDIR \-\-interval SEQ:START\-END \fIOPTIONS\fR . .P \fBmaf_extract\fR \-d MAFDIR \-\-bed BED \fIOPTIONS\fR . .SH "DESCRIPTION" \fBmaf_extract\fR extracts alignment blocks from one or more indexed MAF files, according to either a genomic interval specified with \fB\-\-interval\fR or multiple intervals given in a BED file specified with \fB\-\-bed\fR\. . .P It can either match blocks intersecting the specified intervals with \fB\-\-mode intersect\fR, the default, or extract slices of them which cover only the specified intervals, with \fB\-\-mode slice\fR\. . .P Blocks and the sequences they contain can be filtered with a variety of options including \fB\-\-only\-species\fR, \fB\-\-with\-all\-species\fR, \fB\-\-min\-sequences\fR, \fB\-\-min\-text\-size\fR, and \fB\-\-max\-text\-size\fR\. . .P With the \fB\-\-join\-blocks\fR option, adjacent parsed blocks can be joined if sequence filtering has removed a species causing them to be separated\. The \fB\-\-remove\-gaps\fR option will remove columns containing only gaps (\fB\-\fR)\. . .P Blocks can be output in MAF format, with \fB\-\-format maf\fR (the default), or FASTA format, with \fB\-\-format fasta\fR\. Output can be directed to a file with \fB\-\-output\fR\. . .P This tool exposes almost all the random\-access functionality of the Bio::MAF::Access class\. The exception is MAF tiling, which is provided by maf_tile(1)\. . .SH "FILES" A single MAF file can be processed by specifying it with \fB\-\-maf\fR\. Its accompanying index, created by maf_index(1), is specified with \fB\-\-index\fR\. If \fB\-\-maf\fR is given but no index is specified, the entire file will be parsed to build a temporary in\-memory index\. This facilitates processing small, transient MAF files\. However, on a large file this will incur a great deal of overhead; files expected to be used more than once should be indexed with maf_index(1)\. . .P MAF files can optionally be BGZF\-compressed, as produced by bgzip(1) from samtools\. . .P Alternatively, a directory of indexed MAF files can be specified with \fB\-\-maf\-dir\fR; in this case, they will all be used to satisfy queries\. . .SH "OPTIONS" MAF source options: . .TP \fB\-m\fR, \fB\-\-maf MAF\fR A single MAF file to process\. . .TP \fB\-i\fR, \fB\-\-index INDEX\fR An index for the file specified with \fB\-\-maf\fR, as created by maf_index(1)\. . .TP \fB\-d\fR, \fB\-\-maf\-dir DIR\fR A directory of indexed MAF files\. . .P Extraction options: . .TP \fB\-\-mode (intersect | slice)\fR The extraction mode to use\. With \fB\-\-mode intersect\fR, any alignment block intersecting the genomic intervals specified will be matched in its entirety\. With \fB\-\-mode slice\fR, intersecting blocks will be matched in the same way, but columns extending outside the specified interval will be removed\. . .TP \fB\-\-bed BED\fR The specified file will be parsed as a BED file, and each interval it contains will be matched in turn\. . .TP \fB\-\-interval SEQ:START\-END\fR A single zero\-based half\-open genomic interval will be matched, with sequence identifier \fIseq\fR, (inclusive) start position \fIstart\fR, and (exclusive) end position \fIend\fR\. . .P Output options: . .TP \fB\-f\fR, \fB\-\-format (maf | fasta)\fR Output will be written in the specified format, either MAF or FASTA\. . .TP \fB\-o\fR, \fB\-\-output OUT\fR Output will be written to the file \fIout\fR\. . .P Filtering options: . .TP \fB\-\-only\-species (SP1,SP2,SP3 | @FILE)\fR Alignment blocks will be filtered to contain only the specified species\. These can be given as a comma\-separated list or as a file, prefixed with \fB@\fR, from which a list of species will be read\. . .TP \fB\-\-with\-all\-species (SP1,SP2,SP3 | @FILE)\fR Only alignment blocks containing all the specified species will be matched\. These can be given as a comma\-separated list or as a file, prefixed with \fB@\fR, from which a list of species will be read\. . .TP \fB\-\-min\-sequences N\fR Only alignment blocks containing at least \fIn\fR sequences will be matched\. . .TP \fB\-\-min\-text\-size N\fR Only alignment blocks with a text size (including gaps) of at least \fIn\fR will be matched\. . .TP \fB\-\-max\-text\-size N\fR Only alignment blocks with a text size (including gaps) of at most \fIn\fR will be matched\. . .P Block processing options: . .TP \fB\-\-join\-blocks\fR If sequence filtering with \fB\-\-only\-species\fR removes a species which caused two adjacent blocks to be separate, this option will join them together into a single alignment block\. The filtered blocks must contain the same sequences in contiguous positions and on the same strand\. . .TP \fB\-\-remove\-gaps\fR If sequence filtering with \fB\-\-only\-species\fR leaves a block containing columns consisting only of gap characters (\fB\-\fR), these will be removed\. . .TP \fB\-\-parse\-extended\fR Parse \fBi\fR lines, giving information on the context of sequence lines, and \fBq\fR lines, giving quality scores\. . .TP \fB\-\-parse\-empty\fR Parse \fBe\fR lines, indicating cases where a species does not align with the current block but does align with blocks before and after it\. . .P Logging options: . .TP \fB\-q\fR, \fB\-\-quiet\fR Run quietly, with warnings suppressed\. . .TP \fB\-v\fR, \fB\-\-verbose\fR Run verbosely, with additional informational messages\. . .TP \fB\-\-debug\fR Log debugging information\. . .SH "EXAMPLES" Extract MAF blocks intersecting with a given interval: . .IP "" 4 . .nf $ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766 . .fi . .IP "" 0 . .P As above, but operating on a single file: . .IP "" 4 . .nf $ maf_extract \-m test/data/mm8_chr7_tiny\.maf \e \-i test/data/mm8_chr7_tiny\.kct \e \-\-interval mm8\.chr7:80082592\-80082766 . .fi . .IP "" 0 . .P Like the first case, but writing output to a file: . .IP "" 4 . .nf $ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766 \e \-\-output out\.maf . .fi . .IP "" 0 . .P Extract a slice of MAF blocks over a given interval: . .IP "" 4 . .nf $ maf_extract \-d test/data \-\-mode slice \e \-\-interval mm8\.chr7:80082592\-80082766 . .fi . .IP "" 0 . .P Filter for sequences from only certain species: . .IP "" 4 . .nf $ maf_extract \-d test/data \-\-interval mm8\.chr7:80082592\-80082766 \e \-\-only\-species hg18,mm8,rheMac2 . .fi . .IP "" 0 . .P Extract only blocks with all specified species: . .IP "" 4 . .nf $ maf_extract \-d test/data \-\-interval mm8\.chr7:80082471\-80082730 \e \-\-with\-all\-species panTro2,loxAfr1 . .fi . .IP "" 0 . .P Extract blocks with at least a certain number of sequences: . .IP "" 4 . .nf $ maf_extract \-d test/data \-\-interval mm8\.chr7:80082767\-80083008 \e \-\-min\-sequences 6 . .fi . .IP "" 0 . .P Extract blocks with text sizes in a certain range: . .IP "" 4 . .nf $ maf_extract \-d test/data \-\-interval mm8\.chr7:0\-80100000 \e \-\-min\-text\-size 72 \-\-max\-text\-size 160 . .fi . .IP "" 0 . .SH "ENVIRONMENT" \fBmaf_index\fR is a Ruby program and relies on ordinary Ruby environment variables\. . .SH "BUGS" No provision exists for writing output to multiple files\. . .P FASTA description lines are always in the format \fB>source:start\-end\fR\. . .SH "COPYRIGHT" \fBmaf_index\fR is copyright (C) 2012 Clayton Wheeler\. . .SH "SEE ALSO" ruby(1), maf_index(1), maf_tile(1), bgzip(1)