Repeated DNA Sequences Problem


Description

LeetCode Problem 187.

The DNA sequence is composed of a series of nucleotides abbreviated as ‘A’, ‘C’, ‘G’, and ‘T’.

  • For example, “ACGAATTCCG” is a DNA sequence.

When studying DNA, it is useful to identify repeated sequences within the DNA.

Given a string s that represents a DNA sequence, return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.

Example 1:

1
2
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]

Example 2:

1
2
Input: s = "AAAAAAAAAAAAA"
Output: ["AAAAAAAAAA"]

Constraints:

  • 1 <= s.length <= 10^5
  • s[i] is either ‘A’, ‘C’, ‘G’, or ‘T’.


Sample C++ Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        int l = s.size();
        vector<string> ans;
        if (l < 10)
            return ans;
        
        unordered_map<string, int> ht;
        string subs;
        for (int i = 0; i < l - 10 + 1; i ++) {
            subs = s.substr(i, 10);
            if (ht.find(subs) == ht.end()) {
                ht[subs] = 0;
            } 
            ht[subs] ++;
            if (ht[subs] == 2)
                ans.push_back(subs);
                
        }
        return ans;
            
    }
};




Related Posts

X Of A Kind In A Deck Of Cards Problem

LeetCode 914. In a deck of cards, each card has...

Word Subsets Problem

LeetCode 916. You are given two string arrays words1 and...

Vowel Spellchecker Problem

LeetCode 966. Given a wordlist, we want to implement a...

Verifying An Alien Dictionary Problem

LeetCode 953. In an alien language, surprisingly, they also use...

Unique Morse Code Words Problem

LeetCode 804. International Morse Code defines a standard encoding where...

Unique Email Addresses Problem

LeetCode 929. Every valid email consists of a local name...