maf_tile(1) -- synthesize an alignment for a given region ========================================================= ## SYNOPSIS `maf_tile` [] -i [SEQ:]BEGIN:END [-s SPECIES[:NAME] ...] [index] `maf_tile` [] --bed BED -o BASE [-s SPECIES[:NAME] ...] [index] ## DESCRIPTION **maf_tile** takes a MAF file, with optional index, or directory of indexed MAF files, extracts alignment blocks overlapping the given genomic interval, and constructs a single alignment block covering the entire interval for the specified species. Optionally, any gaps in coverage of the MAF file's reference sequence can be filled in from a FASTA sequence file. If a single interval is specified, the output will be written to stdout in FASTA format. If a directory of MAF files is supplied as the parameter, the interval must include the sequence identifier in the form `sequence:begin:end`. If the `--output-base` option is specified, `_:.fa` will be appended to the given parameter and used to construct the output path. If a BED file is specified with `--bed`, `--output-base` is also required. Species can be renamed for output by specifying them as SPECIES:NAME; the first component will be used to select the species from the MAF file, and the second will be used in the FASTA description line for output. ## OPTIONS * `-r`, `--reference SEQ`: The FASTA reference sequence file given, which may be gzipped, will be used to fill in any gaps between alignment blocks. * `-i`, `--interval BEGIN:END`: The given zero-based genomic interval will be used to select alignment blocks from the MAF file. * `-s`, `--species SPECIES[:NAME]`: The given species will be selected for output. If given as `species:name`, it will appear in the FASTA output as . * `-b`, `--bed BED`: The given BED file will be used to provide a list of intervals to process. If present, `--interval` will be ignored and `--output-base` must be given as well. * `-o`, `--output-base BASE`: The given path will be used as the base name for output files, as described above. * `-q`, `--quiet`: Run quietly, with warnings suppressed. * `-v`, `--verbose`: Run verbosely, with additional informational messages. * `--debug`: Log debugging information. ## EXAMPLES Generate an alignment of the `hg19`, `petMar1`, and `ornAna1` sequences from `chrY.maf` over the interval 14400 to 15000 on the reference sequence of the MAF file. Fills in gaps from `chrY.refseq.fa.gz`. Writes FASTA output to stdout. $ maf_tile --reference ~/maf/chrY.refseq.fa.gz \ --interval 14400:15000 \ -s hg19:human -s petMar1 -s ornAna1 \ chrY.maf chrY.kct >human GGGTGACGAAAAGAGCCGA-----[...] >petMar1 gagtgccggggagtgccggggagt[...] >ornAna1 AGGGATCTGGGAATTCTGG-----[...] Write out a FASTA file for each interval in the given BED file, prefixed with `/tmp/mm8`, and without filling in data from a reference sequence: $ maf_tile --bed /tmp/mm8.bed --output-base /tmp/mm8 \ -s mm8:mouse -s rn4:rat -s hg18:human \ mm8_chr7_tiny.maf mm8_chr7_tiny.kct ## FILES The output is generated in FASTA format, with one sequence per species. The parameter must specify either a Multiple Alignment Format (MAF) file or a directory of such files, with indexes. The must be a MAF index built with maf_index(1). This parameter is ignored if the parameter is a directory. It can be omitted if a single MAF file is given, but in this case the entire file will be parsed to build a temporary index. For large files which will be reused, this is not advisable. If `--bed` is specified, its argument must be a BED file. Only the second and third columns will be used, to specify the zero-based start and end positions of intervals. ## ENVIRONMENT `maf_tile` is a Ruby program and relies on ordinary Ruby environment variables. ## COPYRIGHT `maf_tile` is copyright (C) 2012 Clayton Wheeler. ## SEE ALSO maf_index(1), ruby(1) *