Repeated DNA Sequences
要点:可以用rolling hash做,不过只beat了7%的python解法,还是用bit数组快。rolling hash还是就用左边高位吧,别老变来变去的,这样每次取余去掉高位。
错误点:rolling hash:rolling的公式可不是+=All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",Return:["AAAAACCCCC", "CCCCCAAAAA"].
class Solution(object): def findRepeatedDnaSequences(self, s): """ :type s: str :rtype: List[str] """ n = len(s) res = [] if n<11: return res hmap = {} hval = 0 cmap = {'A':0,'C':1,'G':2,'T':3} for i in xrange(10): hval = (hval<<2 & 0xfffff) + cmap[s[i]] hmap[hval]=1 # print bin(hval) for i in xrange(10, n): hval = (hval<<2 & 0xfffff) + cmap[s[i]] if hval in hmap and hmap[hval]==1: res.append(s[i-9:i+1]) if hval not in hmap: hmap[hval]=0 hmap[hval]+=1 return res