介绍
使用transdecode软件,并结合参考基因,做编码序列预测。
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.
The software is primarily maintained by Brian Haas at the Broad Institute and Alexie Papanicolaou at the Commonwealth Scientific and Industrial Research Organisation (CSIRO). It is integrated into other related software such as Trinity, PASA, EVidenceModeler, and Trinotate.
Full documentation is provided at: http://transdecoder.github.io
输入
trans fasta文件:
组装序列,fasta格式
例如
>seqname1
AAAAAAAAGCTACTTGGAGTACCAATAATAAAGTGAGCCCACCTTCCTGGTACCCAGACATTTC
>seqname2
GGAGTACCAATAATAAAGTGAGCCCACCTTCCTGGTACCCAGAC
比对pfam的fasta文件:
参考的基因的氨基酸序列,fasta格式,例如
>gene1
MLSFFTKNTLTKRKLIMLALAIVFTFFAFGLYFIPHDEISVFDFKLPALQYETTVTSLD
参数
-T <span><</span>int> top longest ORFs to train Markov Model (hexamer stats) (default: 500)
结果
transdecoder.pep
transdecoder.cds
transdecoder.bed
transdecoder.gff3