A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:
This means that the binary representation of the byte starts with the following pattern for n bytes:
Given an array of integers representing the data, return whether it is a valid UTF-8 encoding.
true
if the data is valid UTF-8 encoding and false
otherwise.10
.Here’s the implementation of the UTF-8 validation in Java:
public class Solution {
public boolean validUtf8(int[] data) {
int numberOfBytes = 0;
for (int i = 0; i < data.length; i++) {
int num = data[i];
// If this is a new character, calculate the number of bytes
if (numberOfBytes == 0) {
if ((num >> 5) == 0b110) numberOfBytes = 1;
else if ((num >> 4) == 0b1110) numberOfBytes = 2;
else if ((num >> 3) == 0b11110) numberOfBytes = 3;
else if ((num >> 7) == 0b0) numberOfBytes = 0;
else return false;
} else {
// Check if the byte is a valid continuation (10xxxxxx)
if ((num >> 6) != 0b10) return false;
numberOfBytes--;
}
}
// All characters should be properly ended
return numberOfBytes == 0;
}
public static void main(String[] args) {
Solution solution = new Solution();
int[] data1 = {197, 130, 1};
System.out.println(solution.validUtf8(data1)); // Output: true
int[] data2 = {235, 140, 4};
System.out.println(solution.validUtf8(data2)); // Output: false
}
}
n
is the length of the input array.This solution ensures that the given data adheres to the UTF-8 encoding rules by validating each byte appropriately and keeping track of how many continuation bytes are expected.
Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?