H-index problem
Do not miss this exclusive book on Binary Tree Problems. Get it now for free.
Problem Statement:
Calculate the researcher's h-index from an array of integer citations, where citations[i] is the number of citations a researcher obtained for their ith work.
According to the definition of h-index, a scientist has an index h if h of their n articles have at least h citations each, while the other n h papers have no more than h citations each.
In cases when h has multiple possible values, the highest value is used to determine the h-index.
Pre-requisite: Counting Sort
Example :
Example 1:
Input: citations = [8,2,5,1,7]
Output: 3
Explanation: [8,2,5,1,7] means the researcher has 5 papers in total and each of them has received 8,2,5,1,7 citation.
Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.
Example 2:
Input: citations = [11,15]
Output: 2
Explanation: [11,15] means the researcher has 2 papers in total and each of them has received 11,15 citation.
Since the researcher has 2 papers with at least 2 citations each, their h-index is 2.
Note: The following example indicates that h-index need not be equal to the number of citations, but is actually the minimum no of citations(say h) equal to the number of papers published having equal to or more than h citations.
SOLVING THE PROBLEM
Logic:
As per the definition of h-index,an easy approach will be to sort the array first and then look for the possible values of h-index.
METHOD - I : Brute Force
Brute-Force: The brute force method will be to sort the array first using inbuilt sorting. Next we can use linear search to find the h-index value which satisfies the condition.
Algorithm:
- Sort the array
- Initialize index to 0;
- for loop from 0 to length of array
- if the value of array at index i is greater than or equal to length of array-i.
- return (length of array - i)
Implemented Code:
public class Solution {
public int hIndex(int[] citations) {
// sorting the citations in ascending order
Arrays.sort(citations);
// finding h-index
for(int i = 0;i<citations.length;i++){
if(citations[i] >= citations.length-i) return citations.length-i;
}
return 0;
}
}
Explanation:
By using Arrays.sort() method we gt our sorted array first.
h-index = length of array-index from where the array has h or more citations
We write the above statement using if condition and loop over the sorted citations array to find the possible solution using linear search.
Time Complexity
Time: O(nlogn)
Reason : Arrays.sort() is an inbuilt function has a time complexity of 0(nlogn).
As for the linear search time complexity is O(n).
Hence total time complexity is O(nlogn).
Space: O(1)
Reason : We have used no extra space. Hence space complexity is O(1).
METHOD - II : Using Counting Sort
Counting Sort
Counting sort is an efficient, stable sorting algorithm that operates by counting the number of occurrences of each distinct element in the input array, and then using that information to place the elements in their correct positions in the output array.
This algorithm is particularly well suited for sorting arrays with a small number of distinct elements, and has a time complexity of O(n+k) where n is the number of elements in the input array and k is the number of distinct elements.
Why use Count Sort?
As the citations can be a very large number, however, the maximum h index can only be the number of papers, we can use the maximum counting number to find h-index.
All +ve integers since citation number.
Sorts faster for smaller arrays.
Algorithm
- initialize a count array of length n+ 1 where n is the length of input array
- for each value in citations array; we increment the count in count array
- initialize index to n; and sum to 0
- for each value in count array starting from n.
- if sum + count [index] > index
- decrement index
Implemented Code:
public class Solution {
public int hIndex(int[] citations) {
int n = citations.length;
int[] papers = new int[n + 1];
// counting papers for each citation number
for (int i: citations)
papers[Math.min(n, i)]++;
// finding the h-index
int h = n;
for (int s = papers[n]; h > s; s += papers[h])
h--;
return h;
}
}
Time Complexity :
TC - O(n)
Reason : Counting Sort has a time complexity of O(n) and linear search applied also has a time complexity of O(n). Hence total time complexity is O(n).
SC - O(n)
Reason : Extra space has been used to form count array. Hence space complexity is O(n).
Explanation :
The method first initializes a variable "n" to the length of the citations array and creates an array "papers" of size n+1.
It then uses a for loop to count the number of papers for each citation number, much similar to a count array in Counting sort.
We then calculates the h-index by iterating through the papers array in reverse order, starting with the last element (n), and adding the number of papers for each citation number until the total number of papers equals or exceeds the current citation number.
The h-index is then returned as the current citation number.
Conclusion:
With this article at OpenGenus, we have solved the following problem of calculation h-index using sorting and count sorting algorithm.
Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.