Home > Applied Algorithms, Arrays, Iteration, Math, Sorting > Find ‘k’ smallest numbers from a million numbers (and take the girl with you!)

Find ‘k’ smallest numbers from a million numbers (and take the girl with you!)

Your mum just told the “aunty next door”, the obese mother of your 6-year-crush “girl next door” that you are extremely smart and good in Math. WTF, mom?! And aunty being the crook she is, gives you her shopping list which has 1000 products with their prices and asks you to pick the least-values 10 products. Your crush is right there. If only you were in the Matrix, right?

Scale out the problem to finding the k smallest numbers from a million numbers and throw in a computer to solve the problem with code. What now? How do you go about it?

Well, you can sort all elements in O(n log n) time and pick the first k numbers. But can’t you try and be a little smarter? Even the girl knows basic sorting!

Maybe try and find a worst case O(n) complexity algorithm? Oh, stop staring! So here is the magic mantra: Hoare’s Quick Select algorithm.

In this algorithm, we use the technique of Partitioning from Quick Sort to get what we want. Partitioning basically takes a pivot element at random, puts it in its right place and puts all numbers lesser that the pivot to one side and the rest on the other. So if we are able to get the “k”th smallest number as the pivot, we are done! (as we can then pick the whole partition with k elements as our answer)

1. Start with a random pivot. Iteration number = 1 for the first time.

2. Partition the array. Takes O(n/iteration number) time.

3. If the pivot is placed at the kth position (relative to original array), we are done! Else, first increment Iteration number. Then : If it is placed before the kth position, go to step 2 with the right hand partition as the array. Otherwise go to step 2 with the left partition.

Got the point? This can be a very helpful method to solve problems such as k largest or smallest element, etc.!

Now to  prove time complexity O(n) in worst case. Since we process just one partition each time (as opposed to processing both partitions like we do in Quick Sort), here is how the effort looks –

Effort = n + n/2 + n/4 …

We know that 1/2+1/4+1/8+… = 1, (can prove it too)

Hence, Effort = n + n = 2n = O(n)

Tada! 🙂

  1. June 26, 2013 at 3:17 am

    I have recently read the same question with another awesome approach.
    1.)Build a Maximum Heap for first k elements of the given array
    2.)Then for the remaining n-k elements in the array,compare it with root of the maximum heap
    If the selected element is smaller than root of heap then heapify else do nothing
    3.)So the root will give us the kth smallest element.
    Total Complexity :
    Step 1 : O(k)
    Step 2 : O( (n – k) * log(k))
    Overall Complexity = O(k + (n-k) * log(k))
    We can modify last step to get k smallest elements instead of kth,although the complexity will remain same.
    Cool,isn’t it !
    I will post the link as soon as poosible from where i went through this approach (Lazy me :P)

    • June 28, 2013 at 9:06 pm

      By Heapify I assume you will substitute that element for the root and then heapify the entire k element tree?

      • June 29, 2013 at 12:50 pm

        Yeah.We have to heapify the entire k element tree.That’s why the term log(k) in complexity 🙂

    • June 29, 2013 at 12:42 pm

      Yes, you can do it this way. But why would you? Time complexity is >> O(n). Especially if k and n are comparable.

  2. Swastik
    July 27, 2013 at 10:55 pm

    I didn’t understand the complexity part ..
    What if the pivot is placed at the nth position, i.e. at the end of the array after each iteration ..won’t it give a complexity of O(n^2) in that case?

    • July 28, 2013 at 8:41 am

      My friend, what does pivot being placed at the end mean? Your array is already sorted. You stop the processing. 🙂

  3. September 21, 2013 at 11:47 pm

    Hmm, what if the pivots chosen are bad? Say we need k, but we always get a pivot that is greater than k? Or even worse, every pivot chosen is the largest number, so instead of dividing it into 2 partitions, you just remove one element? That’s very unlikely, but what if it does happen, does the worst time complexity goes to around O(N^2) (Every new element introduced requires it to be compared to all other elements, hence squared)

    • December 5, 2014 at 10:55 am

      You are absolutely right. And that is why this is a variation of the QuickSort algorithm (which if you recall, is plagued by the same issue)

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: