algoadvance

Leetcode 393. UTF

Problem Statement

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

This means that the binary representation of the byte starts with the following pattern for n bytes:

Given an array of integers representing the data, return whether it is a valid UTF-8 encoding.

Clarifying Questions

  1. What data type is the input array?
    • The input is an array of integers.
  2. What is the range of the integers?
    • Each integer in the array is between 0 and 255 (inclusive), representing each byte in the data.
  3. What should the function return if the data is valid or invalid UTF-8?
    • The function should return true if the data is valid UTF-8 encoding and false otherwise.

Strategy

  1. Bit Manipulation:
    • To solve this problem, we will utilize bit manipulation to check the leading bits of each byte.
  2. Main Steps:
    • Initialize a variable to keep track of how many bytes are expected in the current UTF-8 character.
    • Iterate through the array:
      • If we are expecting a new character (i.e., not in the middle of a multi-byte character), determine the number of bytes based on the leading bits.
      • If in the middle of a multi-byte character, ensure the current byte starts with 10.
    • Ensure no extra bytes are expected at the end of the array.

Code

Here’s the implementation of the UTF-8 validation in Java:

public class Solution {
    public boolean validUtf8(int[] data) {
        int numberOfBytes = 0;
        
        for (int i = 0; i < data.length; i++) {
            int num = data[i];
            
            // If this is a new character, calculate the number of bytes
            if (numberOfBytes == 0) {
                if ((num >> 5) == 0b110) numberOfBytes = 1;
                else if ((num >> 4) == 0b1110) numberOfBytes = 2;
                else if ((num >> 3) == 0b11110) numberOfBytes = 3;
                else if ((num >> 7) == 0b0) numberOfBytes = 0;
                else return false;
            } else {
                // Check if the byte is a valid continuation (10xxxxxx)
                if ((num >> 6) != 0b10) return false;
                numberOfBytes--;
            }
        }
        
        // All characters should be properly ended
        return numberOfBytes == 0;
    }

    public static void main(String[] args) {
        Solution solution = new Solution();
        
        int[] data1 = {197, 130, 1};
        System.out.println(solution.validUtf8(data1)); // Output: true

        int[] data2 = {235, 140, 4};
        System.out.println(solution.validUtf8(data2)); // Output: false
    }
}

Time Complexity

This solution ensures that the given data adheres to the UTF-8 encoding rules by validating each byte appropriately and keeping track of how many continuation bytes are expected.

Cut your prep time in half and DOMINATE your interview with AlgoAdvance AI