The simplest solution for this is maintaining a min heap of max size 5000.
- Every time a new number arrives – check if the heap is smaller then
5000, if it is – add it. - If it is not – check if the minimum is smaller then the new
element, and if it is, pop it out and insert the new element instead. - When you are done – you have a heap containing 5000 largest elements.
This solution is O(nlogk)
complexity, where n
is the number of elements and k
is the number of elements you need (5000 in your case).
It can be done also in O(n)
using selection algorithm – store all the elements, and then find the 5001th largest element, and return everything bigger than it. But it is harder to implement and for reasonable size input – might not be better. Also, if stream contains duplicates, more processing is needed.