Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

  • Time: O(n)
  • Space: O(n)
public List<String> findRepeatedDnaSequences(String s) {
    Set<Integer> once = new HashSet<>();
    Set<Integer> second = new HashSet<>();
    List<String> ret = new ArrayList<>();
    int[] map = new int[26];
    map['A' - 'A'] = 0;
    map['C' - 'A'] = 1;
    map['G' - 'A'] = 2;
    map['T' - 'A'] = 3;
    for (int i = 0; i <= s.length() - 10; i++) {
        int bit = 0;
        for (int j = i; j < i + 10; j++) {
            bit <<= 2;
            bit |= map[s.charAt(j) - 'A'];
        }
        if (!once.add(bit) && second.add(bit)) {
            ret.add(s.substring(i, i + 10));
        }
    }
    return ret;
}

results matching ""

    No results matching ""