1. 基因组序列
2. prodigal基因预测
介绍
Prodigal是为细菌和古菌基因组进行蛋白编码基因预测的软件。
可分析单基因组或宏基因组。
生成基因预测结果、基因核酸序列和蛋白序列。
输入
基因组序列:
fasta格式,例如
>seqname1
AAAAAAAAGCTACTTGGAGTACCAATAATAAAGTGAGCCCACCTTCCTGGTACCCAGACATTTC
g: 指定翻译密码子 (default 11).
p: 选择注释程序 (单基因组 or 宏基因组). Default is single.
结果
基因gff文件
例如
##gff-version 3
# Sequence Data: seqnum=1;seqlen=5129;seqhdr="scaffold4 61.7"
# Model Data: version=Prodigal.v2.6.1;run_type=Single;model="Ab initio";gc_cont=71.11;transl_table=11;uses_sd=1
scaffold4 Prodigal_v2.6.1 CDS 2 4498 430.8 - 0 ID=1_1;partial=10;start_type=GTG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.697;conf=99.99;score=430.82;cscore=429.13;sscore=1.69;rscore=10.03;uscore=-5.18;tscore=-3.17;
基因核酸序列文件
基因氨基酸序列文件
3. tRNA预测
介绍
tRNAscan-SE: An improved tool for transfer RNA detection
tRNAscan-SE was written in the PERL (version 5) script language.
Input consists of DNA or RNA sequences in FASTA format. tRNA
predictions are output in standard tabular or ACeDB format.
tRNAscan-SE does no tRNA detection itself, but instead combines the
strengths of three independent tRNA prediction programs by negotiating
the flow of information between them, performing a limited amount of
post-processing, and outputting the results in one of several
formats.
输入
序列文件:
fasta格式,例如
>seqname1
AAAAAAAAGCTACTTGGAGTACCAATAATAAAGTGAGCCCACCTTCCTGGTACCCAGACATTTC
类型:
-E : search for eukaryotic tRNAs (default)
-B : search for bacterial tRNAs
-A : search for archaeal tRNAs
-M <span><</span>model> : search for mitochondrial tRNAs
options: mammal, vert
-O : search for other organellar tRNAs
结果
1. trna汇总表
out.gff
例如
Sequence tRNA Bounds tRNA Anti Intron Bounds Cove
Name tRNA # Begin End Type Codon Begin End Score Note
-------- ------ ----- ------ ---- ----- ----- ---- ------ ------
tig00000462_pilon_pilon 1 1742900 1742984 Leu GAG 0 0 57.02
tig00000462_pilon_pilon 2 2139254 2139329 Leu CAA 0 0 51.89
tig00000462_pilon_pilon 3 2955236 2955309 Gly TCC 0 0 81.79
tig00000462_pilon_pilon 4 3034584 3034656 Arg TCT 0 0 78.55
2. tRNA结构文件
out.tRNA.struc
例如
tig00000462_pilon_pilon.trna1 (1742900-1742984) Length: 85 bp
Type: Leu Anticodon: GAG at 34-36 (1742933-1742935) Score: 57.02
* | * | * | * | * | * | * | * |
Seq: GTCCGGGTGGCGGAATGGCaGACGCGCTAGCTTGAGGTGCTAGTGCCCTTTATCGGGCGTGGGGGTTCAAGTCCCCCCTCGGACA
Str: >>>>>>>..>>>..........<span><</span><span><</span><span><</span>.>>>>>.......<span><</span><span><</span><span><</span><span><</span><span><</span>.>>>>......<span><</span><span><</span><span><</span><span><</span>..>>>>>.......<span><</span><span><</span><span><</span><span><</span><span><</span><span><</span><span><</span><span><</span><span><</span><span><</span><span><</span><span><</span>.
3. trna序列文件
trna.fasta
例如
>tig00000462_pilon_pilon.trna1 tig00000462_pilon_pilon:1742900-1742984 (+) Leu (GAG) 85 bp Sc: 57.02
GTCCGGGTGGCGGAATGGCAGACGCGCTAGCTTGAGGTGCTAGTGCCCTTTATCGGGCGT
GGGGGTTCAAGTCCCCCCTCGGACA
4. rRNA预测
介绍
使用barrnap软件对细菌、古细菌、线粒体、真核等做rna预测。
并使用脚本提取rna序列
If you use Barrnap in your work, please cite:
Seemann T
barrnap 0.9 : rapid ribosomal RNA prediction
https://github.com/tseemann/barrnap
输入
fasta:基因组序列文件
例如
>seqname1
AAAAAAAAGCTACTTGGAGTACCAATAATAAAGTGAGCCCACCTTCCTGGTACCCAGACATTTC
kingdom : Kingdom: bac euk mito arc (default 'bac')
lencutoff : Proportional length threshold to label as partial (default '0.8')
reject : Proportional length threshold to reject prediction (default '0.25')
evalue : Similarity e-value cut-off (default '1e-06')
结果
1. rrna.gff
例如
##gff-version 3
tig00000462_pilon_pilon barrnap:0.9 rRNA 585562 585672 2.7e-13 - . Name=5S_rRNA;product=5S ribosomal RNA
tig00000462_pilon_pilon barrnap:0.9 rRNA 585767 588884 0 - . Name=23S_rRNA;product=23S ribosomal RNA
tig00000462_pilon_pilon barrnap:0.9 rRNA 589164 590687 0 - . Name=16S_rRNA;product=16S ribosomal RNA
2. rna.fasta
例如
>tig00000462_pilon_pilon:585561-585672
GGCGGCGTCCTACTCTCCCACAGGGTCCCCCCTGCAGTACCATCGGCGCTGAAAGGCTTAGCT
TCCGGGTTCGGAATGTAACCGGGCGTTTCCCTAACGCTATAACCACCG
5. 重复序列检测
介绍
RepeatMasker
Developed by Arian Smit and Robert Hubley
Please refer to: Smit, AFA, Hubley, R. & Green, P "RepeatMasker" at
http://www.repeatmasker.org
RepeatMasker is a program that screens DNA sequences for interspersed
repeats and low complexity DNA sequences. The output of the program is
a detailed annotation of the repeats that are present in the query
sequence as well as a modified version of the query sequence in which
all the annotated repeats have been masked (default: replaced by
Ns). Sequence comparisons in RepeatMasker are performed by the program
cross_match, an efficient implementation of the Smith-Waterman-Gotoh
algorithm developed by Phil Green, or by WU-Blast developed by Warren
Gish
输入
fasta格式例如(可以是多条序列):
>seq1
AAAAAAAAGCTACTTGGAGTACCAATAATAAAGTGAGCCCACCTTCCTGGTACCCAGACATTTC
结果
*.tbl 重复序列预测结果
*.masked 屏蔽掉重复序列的新序列
*.out 重复序列预测结果整理后的表格
6. 基因注释
介绍
**EggNOG-mapper** is a tool for fast functional annotation of novel sequences. It uses precomputed orthologous groups and phylogenies from the eggNOG database (http://eggnog5.embl.de) to transfer functional information from fine-grained orthologs only.
Common uses of eggNOG-mapper include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs.
The use of orthology predictions for functional annotation permits a higher precision than traditional homology searches (i.e. BLAST searches), as it avoids transferring annotations from close paralogs (duplicate genes with a higher chance of being involved in functional divergence).
Benchmarks comparing different eggNOG-mapper options against BLAST and InterProScan [can be found here](https://github.com/jhcepas/emapper-benchmark/blob/master/benchmark_analysis.ipynb).
EggNOG-mapper is also available as a public online resource: http://eggnog-mapper.embl.de
# Documentation
https://github.com/jhcepas/eggnog-mapper/wiki
If you use this software, please cite:
[1] eggNOG-mapper v2: functional annotation, orthology assignments, and domain
prediction at the metagenomic scale. Carlos P. Cantalapiedra,
Ana Hernandez-Plaza, Ivica Letunic, Peer Bork, Jaime Huerta-Cepas. 2021.
Molecular Biology and Evolution, msab293, https://doi.org/10.1093/molbev/msab293
[2] eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated
orthology resource based on 5090 organisms and 2502 viruses. Jaime
Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza, Sofia
K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas Rattei, Lars
J Jensen, Christian von Mering, Peer Bork Nucleic Acids Res. 2019 Jan 8;
47(Database issue): D309–D314. doi: 10.1093/nar/gky1085
输入
基因的蛋白序列文件(fasta格式)
例如:
>geneName1
MKLLAHILCLSLALAWAQSQDHALAVLDRCEGLEMDAVAVNEEGIPYFFKGDHLFKGFHG
>geneName2
MWVGEERFEGSRLVVVTRGAVSVGGEGVEDVGGGAVWGLVRSAQSEHPGRFVLVDADVDA
DVDTGVVPDVVGLGESQVAVRGGRVWVPRLVGVNSGGGVRAGGGVVRRGLGSGVALVTGG
TGLLGGLVARHLVSAYGVGELVLVSRRGPGAPGVGALVGELEELGAGVRVVACDVADRGA
VAELVGSIEGLRVVVHAAGAVDDGVIGSLDGGRLRGVMGPKAWGAWHLHELTSGLDLS
结果
注释的结果表格文件
格式例如:
#query seed_ortholog evalue score eggNOG_OGs max_annot_lvl COG_category Description Preferred_name GOs EC KEGG_ko KEGG_Pathway
KEGG_Module KEGG_Reaction KEGG_rclass BRITE KEGG_TC CAZy BiGG_Reaction PFAMs
geneName3 494419.ALPM01000100_gene1074 4.15e-05 48.9 COG0747@1|root,COG0747@2|Bacteria,2GM5G@201174|Actinobacteria 201174|Actinobacteria
E ABC transporter substrate-binding protein - - - ko:K02035 ko02024,map02024 M00239 - - ko00000,ko
00001,ko00002,ko02000 3.A.1.5 - - SBP_bac_5