## Find ‘k’ smallest numbers from a million numbers (and take the girl with you!)

Your mum just told the “aunty next door”, the obese mother of your 6-year-crush “girl next door” that you are extremely smart and good in Math. WTF, mom?! And aunty being the crook she is, gives you her shopping list which has 1000 products with their prices and asks you to pick the least-values 10 products. Your crush is right there. If only you were in the Matrix, right?

Scale out the problem to finding the k smallest numbers from a million numbers and throw in a computer to solve the problem with code. What now? How do you go about it?

Well, you can** sort all elements in O(n log n) time** and pick the first k numbers. But can’t you try and be a little smarter? Even the girl knows basic sorting!

Maybe try and find a **worst case O(n) complexity algorithm**? Oh, stop staring! So here is the magic mantra: **Hoare’s Quick Select **algorithm.

In this algorithm,** we use the technique of Partitioning from Quick Sort** to get what we want. Partitioning basically takes a pivot element at random, puts it in its right place and puts all numbers lesser that the pivot to one side and the rest on the other. *So if we are able to get the “k”th smallest number as the pivot, we are done!** *(as we can then pick the whole partition with k elements as our answer)

1. Start with a random pivot. Iteration number = 1 for the first time.

2. Partition the array. Takes O(n/iteration number) time.

3. If the pivot is placed at the kth position (relative to original array), we are done! Else, first increment Iteration number. Then : If it is placed before the kth position, go to step 2 with the right hand partition as the array. Otherwise go to step 2 with the left partition.

Got the point? This can be a very helpful method to solve problems such as k largest or smallest element, etc.!

Now to **prove time complexity O(n) in worst case**. Since we process just one partition each time (as opposed to processing both partitions like we do in Quick Sort), here is how the effort looks –

**Effort = n + n/2 + n/4 …**

We know that **1/2+1/4+1/8+… = 1**, (can prove it too)

Hence,** Effort = n + n = 2n = O(n)**

Tada! 🙂

I have recently read the same question with another awesome approach.

1.)Build a Maximum Heap for first k elements of the given array

2.)Then for the remaining n-k elements in the array,compare it with root of the maximum heap

If the selected element is smaller than root of heap then heapify else do nothing

3.)So the root will give us the kth smallest element.

Total Complexity :

Step 1 : O(k)

Step 2 : O( (n – k) * log(k))

Overall Complexity = O(k + (n-k) * log(k))

We can modify last step to get k smallest elements instead of kth,although the complexity will remain same.

Cool,isn’t it !

I will post the link as soon as poosible from where i went through this approach (Lazy me :P)

By Heapify I assume you will substitute that element for the root and then heapify the entire k element tree?

Yeah.We have to heapify the entire k element tree.That’s why the term log(k) in complexity 🙂

Yes, you can do it this way. But why would you? Time complexity is >> O(n). Especially if k and n are comparable.

I didn’t understand the complexity part ..

What if the pivot is placed at the nth position, i.e. at the end of the array after each iteration ..won’t it give a complexity of O(n^2) in that case?

My friend, what does pivot being placed at the end mean? Your array is already sorted. You stop the processing. 🙂

Hmm, what if the pivots chosen are bad? Say we need k, but we always get a pivot that is greater than k? Or even worse, every pivot chosen is the largest number, so instead of dividing it into 2 partitions, you just remove one element? That’s very unlikely, but what if it does happen, does the worst time complexity goes to around O(N^2) (Every new element introduced requires it to be compared to all other elements, hence squared)

You are absolutely right. And that is why this is a variation of the QuickSort algorithm (which if you recall, is plagued by the same issue)