Download svap-2.0.0beta
SPLICE VARIANTS ASSEMBLING PROGRAM ---------------------------------- Splice Variants Assembling Program (SVAP) is a meta assembler for RNA-seq experiements. It takes cufflink outputs for input and assembled transcripts for output in CML format. PREREQUEST ---------- G++ 4.4.5 is recommended, other versions bigger than 3.4 should also work. C++ boost lib 1.49 is recommended, other versions should also work perl 5 perl(Getopt::Long) perl(File::Basename) INSTALL ------- ./configure --prefix=/path/to/be/installed make make install USAGE ----- 1. dosvap.pl This is the mean portal program of SVAP. Usage: dosvap.pl -i input1.gtf input2.gtf ... inputn.gtf -o output_dir/ 2. svap This is the svap standard program. It takes a cluster of transcripts in CML format as input and a set of assembled transcripts in CML format as output. 3. create_transcript_sketch.pl A helper program to draw transcripts sketch pictures from the input and output CML files. For more information, please refer to the examples/examples.sh . COMMON LINE FORMAT (CML Format) ------------------------------- Common Line Fomart (CML Format) is SVAP standard input/output file format. Cml coordinate system is 0 based and the left closed and right open interval, [start, end). And the length mapped on chromosome should be calculated as: length = end - start Each line of cml file represents a sequence and every line contains 15 columns which are seperated by: 01. This column is a number for internal use, generally set to 0. 02. Sequence ID 03. Chromosome 04. Sequence start coordinate on chromosome 05. Sequence end coordinate on chromosome 06. Sequence strand, -1 for minus strand and 1 for plus strand, 0 for unknown 07. Sequence length, It's NOT the length mapped on chromosome. It's the real sequence length. 08. Sequence exons number 09. Exons starts coordinate on chromosome, seperated by comma 10. Exons ends coordinate on chromosome, seperated by comma 11. Exons frames, the frame value could be 0, 1, 2, or 3 for unknown, seperated by comma 12. Exons types(input)/IDs(output), seperated by comma 13 Sequence source 14. Sequence type, e.g. mRNA, lncRNA, etc. 15. Sequence user defined score 16. Sequence Attribute field. Format is ATTR_NAME1=ATTR_VALUE1;ATTR_NAME2=ATTR_VALUE2; ... COMMON EXON FORMAT (CME Format) ------------------------------- Common Exon Format (CME Format) is SVAP standard output file format for identified exons. Each line represents an exon and every line contains 6 columns which a seperated by <tab>: 01. Exon ID 02. Chromosome 03. Exon start coordinate on chromosome 04. Exon end coordinate on chromosome 05. Exon strand, -1 for minus strand and 1 for plus strand, 0 for unknown 06. Input sequences IDs which contains this newly identified exon. CONTACT ------------------------------- Suggestions and Questions please mail to <KONG Lei> kongl@mail.cbi.pku.edu.cn