手动寻找cripsr 引物比较麻烦,而现在有些网站可以完成这一任务,但是,用python 去实现它也很简单。以下是脚本:
1 #!/usr/bin/python 2 # list all crispr-target(20 bp + NGG) 3 4 import re 5 from Bio.Seq import reverse_complement 6 7 genome_seq = open("seq.txt") 8 crispr_list = open("crispr_list.txt", "a") 9 10 sequence = "" 11 for line in genome_seq: 12 sequence = sequence + line.rstrip("\n") 13 14 def GC_number(seq): 15 num_G = seq.count(‘G‘) 16 num_C = seq.count(‘C‘) 17 GC_number = num_G + num_C 18 return GC_number 19 20 for i in range(0, len(sequence)): 21 target = sequence[i:i+23] 22 23 if re.search(r‘[ATCG]{21}GG$‘, target): 24 start = i + 1 25 end = i + 20 26 target = "+ " + target + " from " + str(start) + " to " + str(end) + "---" + "GC_number = " + str(GC_number(target)) 27 crispr_list.write(target + "\n") 28 #print(target) 29 30 RC_sequence = reverse_complement(sequence) 31 32 for i in range(0, len(RC_sequence)): 33 target = RC_sequence[i:i+23] 34 35 if re.search(r‘[ATCG]{21}GG$‘, target): 36 start = i + 1 37 end = i + 20 38 target = "- " + target + " from " + str(start) + " to " + str(end) + "---" + "GC_number = " + str(GC_number(target)) 39 crispr_list.write(target + "\n") 40 #print(target) 41 42 genome_seq.close() 43 crispr_list.close()
理论上,这正则表达式更直接和简单,但是以下代码的结果远远少于上面方法的结果:
1 dna = sequence 2 runs = re.findall(r‘[ATCG]{21}GG‘, dna) 3 for match in runs: 4 #print(str(match))
目前还不是很清楚原因。
原文:http://www.cnblogs.com/OA-maque/p/4808859.html