Feature: Join alignment blocks with reference data In order to produce FASTA output with one sequence per species For use in downstream tools We need to join adjacent MAF blocks together And fill gaps in the reference sequence from reference data Scenario: Non-overlapping MAF blocks in region of interest Given MAF data: """ ##maf version=1 a score=20.0 s sp1.chr1 10 13 + 50 GGGCTGAGGGC--AG s sp2.chr5 53010 13 + 65536 GGGCTGACGGC--AG s sp3.chr2 33010 15 + 65536 AGGTTTAGGGCAGAG a score=21.0 s sp1.chr1 30 10 + 50 AGGGCGGTCC s sp2.chr5 53030 10 + 65536 AGGGCGGTGC """ And chromosome reference sequence: """ >sp1.chr1 CCAGGATGCT GGGCTGAGGG CAGTTGTGTC AGGGCGGTCC GGTGCAGGCA """ When I open it with a MAF reader And build an index on the reference sequence And tile sp1.chr1:0-50 with the chromosome reference And tile with species [sp1, sp2, sp3] And write the tiled data as FASTA Then the FASTA data obtained should be: """ >sp1 CCAGGATGCTGGGCTGAGGGC--AGTTGTGTCAGGGCGGTCCGGTGCAGGCA >sp2 **********GGGCTGACGGC--AG*******AGGGCGGTGC********** >sp3 **********AGGTTTAGGGCAGAG*************************** """ Scenario: Non-overlapping MAF blocks with species map Given MAF data: """ ##maf version=1 a score=20.0 s sp1.chr1 10 13 + 50 GGGCTGAGGGC--AG s sp2.chr5 53010 13 + 65536 GGGCTGACGGC--AG s sp3.chr2 33010 15 + 65536 AGGTTTAGGGCAGAG a score=21.0 s sp1.chr1 30 10 + 50 AGGGCGGTCC s sp2.chr5 53030 10 + 65536 AGGGCGGTGC """ And chromosome reference sequence: """ >sp1.chr1 CCAGGATGCT GGGCTGAGGG CAGTTGTGTC AGGGCGGTCC GGTGCAGGCA """ When I open it with a MAF reader And build an index on the reference sequence And tile sp1.chr1:0-50 with the chromosome reference And tile with species [sp1, sp2, sp3] And map species sp1 as mouse And map species sp2 as hippo And map species sp3 as squid And write the tiled data as FASTA Then the FASTA data obtained should be: """ >mouse CCAGGATGCTGGGCTGAGGGC--AGTTGTGTCAGGGCGGTCCGGTGCAGGCA >hippo **********GGGCTGACGGC--AG*******AGGGCGGTGC********** >squid **********AGGTTTAGGGCAGAG*************************** """ Scenario: Subset of non-overlapping MAF blocks in region Given MAF data: """ ##maf version=1 a score=20.0 s sp1.chr1 10 13 + 50 GGGCTGAGGGC--AG s sp2.chr5 53010 13 + 65536 GGGCTGACGGC--AG s sp3.chr2 33010 15 + 65536 AGGTTTAGGGCAGAG a score=21.0 s sp1.chr1 30 10 + 50 AGGGCGGTCC s sp2.chr5 53030 10 + 65536 AGGGCGGTGC """ And chromosome reference sequence: """ >sp1.chr1 CCAGGATGCT GGGCTGAGGG CAGTTGTGTC AGGGCGGTCC GGTGCAGGCA """ When I open it with a MAF reader And build an index on the reference sequence And tile sp1.chr1:12-36 with the chromosome reference And tile with species [sp1, sp2, sp3] And write the tiled data as FASTA Then the FASTA data obtained should be: """ >sp1 GCTGAGGGC--AGTTGTGTCAGGGCG >sp2 GCTGACGGC--AG*******AGGGCG >sp3 GTTTAGGGCAGAG************* """ Scenario: Overlapping MAF blocks in region of interest Given MAF data: """ ##maf version=1 a score=20.0 s sp1.chr1 10 13 + 50 GGGCTGAGGGC--AG s sp2.chr5 53010 13 + 65536 GGGCTGACGGC--AG s sp3.chr2 33010 15 + 65536 AGGTTTAGGGCAGAG a score=21.0 s sp1.chr1 20 10 + 50 AGGGCGGTCC s sp2.chr5 53020 10 + 65536 AGGGCGGTGC """ And chromosome reference sequence: """ >sp1.chr1 CCAGGATGCT GGGCTGAGGG CAGTTGTGTC AGGGCGGTCC GGTGCAGGCA """ When I open it with a MAF reader And build an index on the reference sequence And tile sp1.chr1:0-50 with the chromosome reference And tile with species [sp1, sp2, sp3] And write the tiled data as FASTA Then the FASTA data obtained should be: """ >sp1 CCAGGATGCTGGGCTGAGGGAGGGCGGTCCAGGGCGGTCCGGTGCAGGCA >sp2 **********GGGCTGACGGAGGGCGGTGC******************** >sp3 **********AGGTTTAGGG****************************** """ @no_jruby Scenario: Tile with CLI tool and reference seq Given test files: | gap-sp1.fa.gz | | gap-1.maf | | gap-1.kct | When I run `maf_tile --reference gap-sp1.fa.gz --interval 0-50 -s sp1:mouse -s sp2:nautilus -s sp3:jaguar gap-1.maf gap-1.kct` Then it should pass with: """ >mouse CCAGGATGCTGGGCTGAGGGC--AGTTGTGTCAGGGCGGTCCGGTGCAGGCA >nautilus **********GGGCTGACGGC--AG*******AGGGCGGTGC********** >jaguar **********AGGTTTAGGGCAGAG*************************** """ @no_jruby Scenario: Tile with CLI tool and no reference seq Given test files: | gap-1.maf | | gap-1.kct | When I run `maf_tile --interval 0-50 -s sp1:mouse -s sp2:nautilus -s sp3:jaguar gap-1.maf gap-1.kct` Then it should pass with: """ >mouse NNNNNNNNNNGGGCTGAGGGC--AGNNNNNNNAGGGCGGTCCNNNNNNNNNN >nautilus **********GGGCTGACGGC--AG*******AGGGCGGTGC********** >jaguar **********AGGTTTAGGGCAGAG*************************** """ @no_jruby Scenario: Tile with CLI tool and BED intervals Given test files: | gap-1.maf | | gap-1.kct | | gap-sp1.fa.gz | And a file named "example.bed" with: """ sp1.chr1 12 36 """ When I run `maf_tile -s sp1:mouse -s sp2:nautilus -s sp3:jaguar --output-base selected --bed example.bed --reference gap-sp1.fa.gz gap-1.maf gap-1.kct` Then it should pass with: """ """ And the file "selected_12-36.fa" should contain exactly: """ >mouse GCTGAGGGC--AGTTGTGTCAGGGCG >nautilus GCTGACGGC--AG*******AGGGCG >jaguar GTTTAGGGCAGAG************* """ @no_jruby Scenario: Tile with CLI tool and implicit index Given test files: | mm8_chr7_tiny.maf | | mm8_chr7_tiny.kct | When I run `maf_tile -s mm8 -s rn4 -s hg18 --interval 80082334-80082344 mm8_chr7_tiny.maf` Then it should pass with: """ >mm8 GGGCTGAGGG >rn4 GGGCTGAGGG >hg18 --------GG """ @no_jruby Scenario: Tile with CLI tool and directory Given test files: | mm8_chr7_tiny.maf | | mm8_chr7_tiny.kct | When I run `maf_tile -s mm8 -s rn4 -s hg18 --interval mm8.chr7:80082334-80082344 .` Then it should pass with: """ >mm8 GGGCTGAGGG >rn4 GGGCTGAGGG >hg18 --------GG """ @no_jruby Scenario: Tile with CLI tool and directory, 1-based Given test files: | mm8_chr7_tiny.maf | | mm8_chr7_tiny.kct | When I run `maf_tile -s mm8 -s rn4 -s hg18 --one-based --interval mm8.chr7:80082335-80082344 .` Then it should pass with: """ >mm8 GGGCTGAGGG >rn4 GGGCTGAGGG >hg18 --------GG """