使用cd-hit软件,去fasta文件的冗余序列
[ 文献引用:Clustering of highly homologous sequences to reduce thesize of large protein database, Weizhong Li, Lukasz Jaroszewski & Adam Godzik. Bioinformatics, (2001) 17:282-283;Tolerating some redundancy significantly speeds up clustering of large protein databases, Weizhong Li, Lukasz Jaroszewski & Adam Godzik. Bioinformatics, (2002) 18:77-82 ]