1. fasta格式
2. gbk格式
3. isescan
介绍
插入序列(IS)是原核基因组中最小但最丰富的可移动元件
ISEScan用于原核生物基因组中的IS元件的鉴定
输入
基因组序列:
fasta格式,例如
>tig00000462_pilon_pilon
CCAGCACCGGGTGGCCGTTGCGCCGCGCGTCCGACAGCCGCTCCAGCAGCAGCATGCCCACACCCTCGCCCCAGCCCGTG
CCGTCCGCGGCGGCCGCGAACGCCTTGCACCGGCCGTCCGGCGCGAGACCCCGCTGACGGCTGAACTCCACGAAGAGGCC
GGGCGTGGACATCACGGTCACGCCACCGGCGAGGGCGAGCGGGCACTCCCCCTGCCGCAGGGACTGCACCGCGAGATGCA
GTGCCACCAGCGACGACGAGCAGGCGGTGTCCACCGTCACCGCGGGCCCCTCGAAGCCGAAGGTGTAGGAGAGCCGACCC
GACATGACGCTGGCCGCGCCGCTGGT...
结果
结果表格文件
例如
>tig00000462_pilon_pilon
1 38904 - 2 1.213472 I:942,38813, D:
38989 40625 - 1 1.194179 I: D:40057,
40644 40814 + 3 1.324823 I: D:
40848 41092 + 3 1.372382 I: D:41080,
41221 42474 - 1 1.216103 I: D:
42729 45808 + 3 1.183207 I: D:44266,
4. islandpath
介绍
使用islandpath软件做原核生物基因岛预测。输入gbk文件
输入
gbk文件:
格式例如,
LOCUS NC_003210 2944528 bp DNA circular CON 17-DEC-2014
DEFINITION Listeria monocytogenes EGD-e chromosome, complete genome.
ACCESSION NC_003210
VERSION NC_003210.1 GI:16802048
DBLINK BioProject: PRJNA61583
KEYWORDS RefSeq.
SOURCE Listeria monocytogenes EGD-e
ORGANISM Listeria monocytogenes EGD-e
Bacteria; Firmicutes; Bacilli; Bacillales; Listeriaceae; Listeria.
REFERENCE 1 (bases 1 to 2944528)
AUTHORS Toledo-Arana,A., Dussurget,O., Nikitas,G., Sesto,N.,
Guet-Revillet,H., Balestrino,D., Loh,E., Gripenland,J., Tiensuu,T.,
Vaitkevicius,K., Barthelemy,M., Vergassola,M., Nahori,M.A.,
Soubigou,G., Regnault,B., Coppee,J.Y., Lecuit,M., Johansson,J. and
Cossart,P.
TITLE The Listeria transcriptional landscape from saprophytism to
virulence
JOURNAL Nature 459 (7249), 950-956 (2009)
PUBMED 19448609
REFERENCE 2 (bases 1 to 2944528)
AUTHORS Chatterjee,S.S., Hossain,H., Otten,S., Kuenne,C., Kuchmina,K.,
Machata,S., Domann,E., Chakraborty,T. and Hain,T.
TITLE Intracellular gene expression profile of Listeria monocytogenes
JOURNAL Infect. Immun. 74 (2), 1323-1338 (2006)
PUBMED 16428782
REFERENCE 3 (bases 1 to 2944528)
AUTHORS Glaser,P., Frangeul,L., Buchrieser,C., Amend,A., Baquero,F.,
Berche,P., Bloecker,H., Brandt,P., Chakraborty,T., Charbit,A.,
Chetouani,F., Couve,E., de Daruvar,A., Dehoux,P., Domann,E.,
Dominguez-Bernal,G., Duchaud,E., Durand,L., Dussurget,O.,
Entian,K.-D., Fsihi,H., Garcia-Del Portillo,F., Garrido,P.,
Gautier,L., Goebel,W., Gomez-Lopez,N., Hain,T., Hauf,J.,
Jackson,D., Jones,L.-M., Karst,U., Kreft,J., Kuhn,M., Kunst,F.,
Kurapkat,G., Madueno,E., Maitournam,A., Mata Vicente,J., Ng,E.,
Nordsiek,G., Novella,S., de Pablos,B., Perez-Diaz,J.-C., Remmel,B.,
Rose,M., Rusniok,C., Schlueter,T., Simoes,N., Tierrez,A.,
Vazquez-Boland,J.-A., Voss,H., Wehland,J. and Cossart,P.
TITLE Comparative genomics of Listeria species
JOURNAL Science 294 (5543), 849-852 (2001)
PUBMED 11679669
REFERENCE 4 (bases 1 to 2944528)
CONSRTM NCBI Genome Project
TITLE Direct Submission
JOURNAL Submitted (08-NOV-2001) National Center for Biotechnology
Information, NIH, Bethesda, MD 20894, USA
REFERENCE 5 (bases 1 to 2944528)
AUTHORS Glaser,P., Frangeul,L. and Rusniok,C.
TITLE Direct Submission
JOURNAL Submitted (06-JUN-2001) Glaser P., Institut Pasteur, Genomique des
Microorganismes Pathogenes, 25 rue du Docteur Roux, 75724 Paris
Cedex 15, FRANCE
COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AL591824.
RefSeq Category: Reference Genome
FGS: First Genome sequenced
MOD: Model Organism
UPR: UniProt Genome
COMPLETENESS: full length.
FEATURES Location/Qualifiers
source 1..2944528
/organism="Listeria monocytogenes EGD-e"
/mol_type="genomic DNA"
/strain="EGD-e"
/db_xref="taxon:169963"
regulatory 305..310
/regulatory_class="ribosome_binding_site"
operon 318..3012
/operon="Operon_001"
/experiment="EXISTENCE:[PMID:19448609]"
gene 318..1673
/gene="dnaA"
/locus_tag="lmo0001"
/operon="Operon_001"
/experiment="EXISTENCE:[PMID:19448609]"
/db_xref="GeneID:984365"
CDS 318..1673
/gene="dnaA"
/locus_tag="lmo0001"
/operon="Operon_001"
/experiment="EXISTENCE:[PMID:19448609]"
/note="binds to the dnaA-box as an ATP-bound complex at
the origin of replication during the initiation of
chromosomal replication; can also affect transcription of
multiple genes including itself."
/codon_start=1
/transl_table=11
/product="chromosome replication initiator DnaA"
/protein_id="NP_463534.1"
/db_xref="GI:16802049"
/db_xref="GeneID:984365"
/translation="MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIIS
APNNFVRDWLEKSYTQFIANILQEITGRLFDVRFIDGEQEENFEYTVIKPNPALDEDG
IEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPAKAYNPLFIYGGVGLGKTHLM
HAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVLLIDDIQFL
AGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPD
LETRIAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITA
GLAAEALKDIIPSSKSQVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYL
SRELTDASLPKIGDEFGGRDHTTVIHAHEKISQLLKTDQVLKNDLAEIEKNLRKAQNM
F"
''''''
结果
基因岛的位置信息文件
例如
GI_1 351459 374596
GI_2 496381 520791
GI_3 1131558 1161675
5. phispy
介绍
使用phispy软件,对基因组序列做前噬菌体预测
输入
gbk文件
例如
LOCUS ntrd02_1 4133097 bp DNA circular BCT 06-MAR-2006
DEFINITION Roseobacter denitrificans. 4133097 bp, complete sequence.
ACCESSION NC_008209
VERSION NC_008209.0
KEYWORDS HTG.
SOURCE Roseobacter denitrificans
ORGANISM Roseobacter denitrificans
Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales;
Rhodobacteraceae; Roseobacter.
REFERENCE 1 (bases 1 to 4133097)
(base) [root@localhost g_eggnog_mapper_120749_732415]# less /home/bioinfor/software/cgview/cgview_xml_builder/sample_input/R_denitrificans.gbk
(base) [root@localhost g_eggnog_mapper_120749_732415]# head -n 100 /home/bioinfor/software/cgview/cgview_xml_builder/sample_input/R_denitrificans.gbk
LOCUS ntrd02_1 4133097 bp DNA circular BCT 06-MAR-2006
DEFINITION Roseobacter denitrificans. 4133097 bp, complete sequence.
ACCESSION NC_008209
VERSION NC_008209.0
KEYWORDS HTG.
SOURCE Roseobacter denitrificans
ORGANISM Roseobacter denitrificans
Bacteria; Proteobacteria; Alphaproteobacteria; Rhodobacterales;
Rhodobacteraceae; Roseobacter.
REFERENCE 1 (bases 1 to 4133097)
AUTHORS Swingley,W.D., Gholba,S., Mastrian,S.D., Matthies,H.J., Hao,J.,
Ramos,H., Acharya,C.R., Conrad,A.L., Taylor,H.L., Dejesa,L.C.,
Shah,M.K., O'Huallachain,M.E., Lince,M.T., Blankenship,R.E.,
Beatty,J.T. and Touchman,J.W.
TITLE A ubiquitous marine phototroph with a novel carbon-fixation pathway
JOURNAL Unpublished
REFERENCE 2 (bases 1 to 4133097)
AUTHORS Touchman,J.W.
TITLE Direct Submission
JOURNAL Submitted (01-MAR-2006) Pathogen Genomics Division, Translational
Genomics Research Institute, 445 N. Fifth Street, Phoenix, Arizona
85004, USA
FEATURES Location/Qualifiers
source 1..4133097
/organism="Roseobacter denitrificans"
/mol_type="genomic DNA"
/strain="OCh 114"
/db_xref="taxon:2434"
/chromosome="Chromosome"
gene 1723..2871
/locus_tag="RD0003"
CDS 1723..2871
/locus_tag="RD0004"
/EC_number="2.1.2.10"
/note="no close characterized matches identified by match
to protein family HMM PF01571"
/codon_start=1
/transl_table=11
/product="aminomethyltransferase, putative"
/translation="MAIIYRTSALAQRHAEIGGELEDWNGMGTAWFYDHSDERAKADY
EAVRTKAGLMDVSGLKKIHLSGPHAAAVIDRATTRNVDKLMPGRAVYAAMLDDRGLFI
DDCVIYRLSVNNWLLVHGTGTGHESLAMAAYGKNVSMIFDDDLHDMSLQGPVAVDFLA
KHVPGIRDLAYFGIIQTKLFGMPVMISRTGYTGERGYEIFCEGRHAIALWDAILEDGK
DMGIRPVQFSTLDLLRTESYLLFYPGDNSETYPFENGAACGDSLWELGLEFTVSPGKT
GFRGAENHYALEGKERFKIYGVRLEGTTAADEGADLLKDGEKVGVVTYGMRSDLFDHT
VGIARMPVECATPGTKMTVRNGDGTEIPCVAEEMPFYDKDKAIRTAKG"
......
结果
结果表格文件
prophage_coordinates.tsv
例如
pp1 NC_002737 529631 569288 529591 529606 570494 570509 CATGTACAACTATAC CATGTACAACTATAC Longest Repeat flanking phage and within 2000 bp
pp2 NC_002737 778642 820599 778526 778576 820960 821010 AAACTCAAGAAGTGATTAAATAAAACATTAAAGAACCTTGTCATATCAAC AAACTCAAGAAGTGATTAAATAAAACATTAAAGAACCTTGTCATATCAAC Longest Repeat flanking phage and within 2000 bp
pp3 NC_002737 1191309 1222549 1193572 1193583 1220349 1220360 TCAGATTTTTT AAAAAATCTGA Longest Repeat flanking phage and within 2000 bp
pp4 NC_002737 1775862 1785658 1774377 1774389 1782817 1782829 AAATGACTAAGT ACTTAGTCATTT Longest Repeat flanking phage and within 2000 bp
6. minced
介绍
使用minCED软件做CRISPRs分析。可用于基因组或宏基因组数据。
输入
基因组序列:
例如,
>tig00000462_pilon_pilon
CCAGCACCGGGTGGCCGTTGCGCCGCGCGTCCGACAGCCGCTCCAGCAGCAGCATGCCCACACCCTCGCCCCAGCCCGTG
CCGTCCGCGGCGGCCGCGAACGCCTTGCACCGGCCGTCCGGCGCGAGACCCCGCTGACGGCTGAACTCCACGAAGAGGCC
GGGCGTGGACATCACGGTCACGCCACCGGCGAG...
结果
结果文件示例
Sequence 'tig00000462_pilon_pilon' (8900851 bp)
CRISPR 1 Range: 79635 - 79720
POSITION REPEAT SPACER
-------- ----------------------- ----------------------------------------
79635 CTTCGCCCTCGCCGTGGCCGCCT TCGCCCTCACCCTGGCGGCCTTCGCCCTCACCCCGGCCGC [ 23, 40 ]
79698 CTTCGCCCTCGCCGTGGCCGCCT
-------- ----------------------- ----------------------------------------
Repeats: 2 Average Length: 23 Average Length: 40
CRISPR 2 Range: 471329 - 471425
POSITION REPEAT SPACER
-------- ---------------------------------- -----------------------------
471329 CGCTGACGCCGGTATCGGTGCCGCTGACGCCGGT ATCGGTGCCGCTGACGCCGGTATCGGTGC [ 34, 29 ]
471392 CGCTGACGCCGGTATCGGTGCCGCTGACGCCGGT
-------- ---------------------------------- -----------------------------
Repeats: 2 Average Length: 34 Average Length: 29