algoadvance

The H-Index is a metric used to measure the productivity and citation impact of the publications of a scientist or scholar. The definition of the H-Index is as follows:

Given an array of integers citations where citations[i] is the number of citations a researcher received for their i-th paper, the H-Index is calculated as follows:

A scientist has an index h if h of their n papers have at least h citations each, and the other n − h papers have no more than h citations each.

We need to write a function hIndex to compute the H-Index.

Example 1:

Input: citations = [3, 0, 6, 1, 5]
Output: 3
Explanation: [3, 0, 6, 1, 5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. 
Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.

Example 2:

Input: citations = [1, 3, 1]
Output: 1

Constraints:

n == citations.length
1 <= n <= 5000
0 <= citations[i] <= 1000

Clarifying Questions

Are negative citation counts permissible?
- No, as per the problem constraints, citation counts are non-negative.
Can there be duplicate citation counts?
- Yes, citation counts for multiple papers can be the same.

Strategy

To determine the H-Index:

Sort: First, sort the array citations in non-decreasing order.
Iterate and Check: Traverse the sorted list from the largest value to the smallest. During the iteration:
- For each paper at index i in the sorted list, check if the number of papers with citation count greater than or equal to the current citation count (which is n - i where n is the length of the sorted list) is at least as large as the number of papers.
- The H-Index corresponds to the maximum value where this condition holds true.

This approach ensures we correctly identify the maximum H-Index by verifying the criteria against the sorted list.

Code

def hIndex(citations):
    citations.sort(reverse=True)
    h_index = 0
    for i, c in enumerate(citations):
        if c >= i + 1:
            h_index = i + 1
        else:
            break
    return h_index

# Example usage:
citations1 = [3, 0, 6, 1, 5]
citations2 = [1, 3, 1]
print(hIndex(citations1))  # Output: 3
print(hIndex(citations2))  # Output: 1

Time Complexity

Sorting the list of citations has a time complexity of (O(n \log n)), where n is the number of papers. Iterating through the list has a time complexity of (O(n)). Therefore, the overall time complexity of the algorithm is dominated by the sorting step, which is (O(n \log n)).

Try our interview co-pilot at AlgoAdvance.com

Got blindsided by a question you didn’t expect?
Spend too much time studying?
Or simply don’t have the time to go over all 3000 questions?