README.rdoc in full_lengther_next-0.0.2 vs README.rdoc in full_lengther_next-0.0.5

- old
+ new

@@ -14,21 +14,55 @@ * FULL-LENGTHERNEXT fixes frame shifts. * It returns the translated protein sequence for the complete genes and the nucleotide sequence with frame shift fixed and highlighting the start and end codon for an easier finding of the gene and the UTR regions. -* FULL-LENGTHERNEXT suggests putative new genes analysing what of the genes classified as unknown are probably coding. +* FULL-LENGTHERNEXT suggests putative new genes analysing what of the genes classified as unknown are probably coding and what are putative non coding RNA sequences. -* It produces a stats file useful for assemblies comparison. +* It produces a HTML file with statistics useful for assemblies comparison. == SYNOPSIS: FULL-LENGTHERNEXT must be fed with a multifasta file containing all unigenes to analyse and which group belongs the organism under study among fungi, human, invertebrates, mammals, plants, rodents or vertebrates, to use the most appropriate databases. Furthermore, it is possible parametrizing the number of cpus to be used (workers), the minimum identity percent (default = 45%) and minimum e value (default = 1e-25) thresholds, the maximum distance between query and subject gene limits (default = 15 amino acids) and a user database of complete proteins if desired. full_lengther_next -f input.fasta -g [fungi|human|invertebrates|mammals|plants|rodents|vertebrates] -d user_db [options] +=== Output +Full-LengthNext results files appear at the end of program execution, grouped in a folder called fl2_results, where the following files can be found: +* alignments.txt: Displays the BLASTx alignment between our query sequence translated into amino acids and the protein sequence from the Full-LengthNext database. +* annotations.txt: in this file, the main information for each query sequence can be found; status, subject accession number, subject description, warning messages, protein obtained and indices provided by BLASTx alignment. +* nc_rna.txt: Putative non coding RNA sequences detected using BLAST. +* nt_seq.txt: It contains the nucleotide sequence, marking when possible the start codon with hyphen and underscore and hyphen (-_-) and the stop codon with three underscores. Useful to find UTRs and gene sequence. +* proteins.fasta: fasta format file with the complete proteins. +* summary_stats.html: summary statistics of the results obtained by Full-LengthNext for the set of query unigenes. It is useful for assemblies comparison. +* tcode_result.txt: It is equivalent to annotations.txt file, but it is used for sequences with no similarity in databases. Possible status are: coding, non-coding or unknown + +=== CLUSTERED INSTALLATION +To install FULL-LENGTHERNEXT into a cluster, you need to have the software available on all machines. By installing it on a shared location, or installing it on each cluster node. Once installed, you need to create a init_file where your environment is correctly setup (paths, BLASTDB, etc): + +export PATH=/apps/blast+/bin:/apps/cd-hit/bin +export BLASTDB=/var/DB/formatted +export FULL_LENGTHER_NEXT_INIT=path_to_init_file +And initialize the FULL_LENGTHER_NEXT_INIT environment variable on your main node (from where FULL-LENGTHERNEXT will be initially launched): + +export FULL_LENGTHER_NEXT_INIT=path_to_init_file +If you use any queue system like PBS Pro or Moab/Slurm, be sure to initialize the variables on each submission script. + +NOTE: all nodes on the cluster should use ssh keys to allow FULL-LENGTHERNEXT to launch workers without asking for a password. + +SAMPLE INIT FILES FOR CLUSTERED INSTALLATION: +Init file +$> cat fln_init_env + +source ~ruby19/init_env +source ~blast_plus/init_env + +export BLASTDB=~full_lenghter_next/DB/formatted/ +export FULL_LENGTHER_NEXT_INIT=~full_lenghter_next/fln_init_env + + === PBS Submission script $> cat sample_work.sh # 12 distributed workers and 1 GB memory per worker: @@ -40,14 +74,14 @@ # create workers file with assigned node names cat ${PBS_NODEFILE} > workers -# init seqtrimnext -source ~seqtrimnext/init_env +# init full-lengthernext +source ~full_lenghter_next/init_env -time seqtrimnext -t paired_ends.txt -Q fastq -w workers -s 10.0.0 +time full_lenghter_next -f input.fasta -g group -d user_db -w workers -s 10.0.0 Once this submission script is created, you only need to launch it with: qsub sample_work.sh == REQUIREMENTS: @@ -99,10 +133,14 @@ gem install full_lengther_next === Install and rebuild Full-LengthNEXT databases -Full-LengthNEXT needs some databases to work. You can use the BLASTDB environment variable to to change the default database location. To install them, execute: +Full-LengthNEXT needs some databases to work. You can use the BLASTDB environment variable to to change the default database location. To set the path for storing databases, execute next line in your terminal or add it to your .bash_profile: + +export BLASTDB=/my_path/ + +To install databases execute: $ download_fln_dbs.rb ==== User database \ No newline at end of file