Download svap-2.0.0beta

SPLICE VARIANTS ASSEMBLING PROGRAM
----------------------------------
Splice Variants Assembling Program (SVAP) is a meta assembler for RNA-seq experiements. It
takes cufflink outputs for input and assembled transcripts for output in CML format.


PREREQUEST
----------
G++ 4.4.5 is recommended, other versions bigger than 3.4 should also work.
C++ boost lib 1.49 is recommended, other versions should also work
perl 5
perl(Getopt::Long)
perl(File::Basename)


INSTALL
-------
./configure --prefix=/path/to/be/installed
make
make install


USAGE
-----
1. dosvap.pl
This is the mean portal program of SVAP.
Usage: dosvap.pl -i input1.gtf input2.gtf ... inputn.gtf -o output_dir/

2. svap
This is the svap standard program. It takes a cluster of transcripts in CML format as input
and a set of assembled transcripts in CML format as output.

3. create_transcript_sketch.pl
A helper program to draw transcripts sketch pictures from the input and output CML files.

For more information, please refer to the examples/examples.sh .


COMMON LINE FORMAT (CML Format)
-------------------------------
Common Line Fomart (CML Format) is SVAP standard input/output file format. Cml coordinate
system is 0 based and the left closed and right open interval, [start, end). And the length
mapped on chromosome should be calculated as: 

length = end - start 

Each line of cml file represents a sequence and every line contains 15 columns which are 
seperated by :

01. This column is a number for internal use, generally set to 0. 
02. Sequence ID
03. Chromosome
04. Sequence start coordinate on chromosome
05. Sequence end coordinate on chromosome
06. Sequence strand, -1 for minus strand and 1 for plus strand, 0 for unknown
07. Sequence length, It's NOT the length mapped on chromosome. It's the real sequence length.
08. Sequence exons number
09. Exons starts coordinate on chromosome, seperated by comma
10. Exons ends coordinate on chromosome, seperated by comma
11. Exons frames, the frame value could be 0, 1, 2, or 3 for unknown, seperated by comma 
12. Exons types(input)/IDs(output), seperated by comma
13  Sequence source
14. Sequence type, e.g. mRNA, lncRNA, etc.
15. Sequence user defined score
16. Sequence Attribute field. Format is ATTR_NAME1=ATTR_VALUE1;ATTR_NAME2=ATTR_VALUE2; ... 


COMMON EXON FORMAT (CME Format)
-------------------------------
Common Exon Format (CME Format) is SVAP standard output file format for identified exons.
Each line represents an exon and every line contains 6 columns which a seperated by <tab>:
01. Exon ID
02. Chromosome
03. Exon start coordinate on chromosome
04. Exon end coordinate on chromosome
05. Exon strand, -1 for minus strand and 1 for plus strand, 0 for unknown
06. Input sequences IDs which contains this newly identified exon.


CONTACT
-------------------------------
Suggestions and Questions please mail to <KONG Lei> kongl@mail.cbi.pku.edu.cn