UTF-8 Validation Problem


Description

LeetCode Problem 393.

Given an integer array data representing the data, return whether it is a valid UTF-8 encoding. A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

  • For a 1-byte character, the first bit is a 0, followed by its Unicode code.
  • For an n-bytes character, the first n bits are all one’s, the n + 1 bit is 0, followed by n - 1 bytes with the most significant 2 bits being 10.

This is how the UTF-8 encoding would work:

1
2
3
4
5
6
7
Char. number range  |        UTF-8 octet sequence
(hexadecimal)    |              (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Note: The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

Example 1:

1
2
3
4
Input: data = [197,130,1]
Output: true
Explanation: data represents the octet sequence: 11000101 10000010 00000001.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.

Example 2:

1
2
3
4
5
6
Input: data = [235,140,4]
Output: false
Explanation: data represented the octet sequence: 11101011 10001100 00000100.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.

Constraints:

  • 1 <= data.length <= 2 * 10^4
  • 0 <= data[i] <= 255


Sample C++ Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Solution {
public:
    bool validUtf8(vector<int>& data) {
        int count = 0;
        for (auto c : data) {
            if (count == 0) {
                if ((c >> 5) == 0b110) count = 1;
                else if ((c >> 4) == 0b1110) count = 2;
                else if ((c >> 3) == 0b11110) count = 3;
                else if ((c >> 7)) return false;
            } else {
                if ((c >> 6) != 0b10) return false;
                count--;
            }
        }
        return count == 0;
    }
};




Related Posts

Bulb Switcher II Problem

LeetCode 672. There is a room with n bulbs labeled...

Binary Number With Alternating Bits Problem

LeetCode 693. Given a positive integer, check whether it has...

Total Hamming Distance Problem

LeetCode 477. The Hamming distance between two integers is the...

Number Complement Problem

LeetCode 476. The complement of an integer is the integer...

Hamming Distance Problem

LeetCode 461. The Hamming distance between two integers is the...

Convert A Number To Hexadecimal Problem

LeetCode 405. Given an integer num, return a string representing...