Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
- Time: O(n)
- Space: O(n)
public List<String> findRepeatedDnaSequences(String s) {
Set<Integer> once = new HashSet<>();
Set<Integer> second = new HashSet<>();
List<String> ret = new ArrayList<>();
int[] map = new int[26];
map['A' - 'A'] = 0;
map['C' - 'A'] = 1;
map['G' - 'A'] = 2;
map['T' - 'A'] = 3;
for (int i = 0; i <= s.length() - 10; i++) {
int bit = 0;
for (int j = i; j < i + 10; j++) {
bit <<= 2;
bit |= map[s.charAt(j) - 'A'];
}
if (!once.add(bit) && second.add(bit)) {
ret.add(s.substring(i, i + 10));
}
}
return ret;
}