Repeated DNA Sequences Problem
Description
LeetCode Problem 187.
The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.
- For example, “ACGAATTCCG” is a DNA sequence.
When studying DNA, it is useful to identify repeated sequences within the DNA.
Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example 1:
1
2
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]
Example 2:
1
2
Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]
Constraints:
- 1 <= s.length <= 10^5
- s[i] is either ‘A’, ‘C’, ‘G’, or ‘T’.
Sample C++ Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Solution {
public:
vector<string> findRepeatedDnaSequences(string s) {
int l = s.size();
vector<string> ans;
if (l < 10)
return ans;
unordered_map<string, int> ht;
string subs;
for (int i = 0; i < l - 10 + 1; i ++) {
subs = s.substr(i, 10);
if (ht.find(subs) == ht.end()) {
ht[subs] = 0;
}
ht[subs] ++;
if (ht[subs] == 2)
ans.push_back(subs);
}
return ans;
}
};