## Find ‘k’ smallest numbers from a million numbers (and take the girl with you!)

Your mum just told the “aunty next door”, the obese mother of your 6-year-crush “girl next door” that you are extremely smart and good in Math. WTF, mom?! And aunty being the crook she is, gives you her shopping list which has 1000 products with their prices and asks you to pick the least-values 10 products. Your crush is right there. If only you were in the Matrix, right?

Scale out the problem to finding the k smallest numbers from a million numbers and throw in a computer to solve the problem with code. What now? How do you go about it?

Well, you can** sort all elements in O(n log n) time** and pick the first k numbers. But can’t you try and be a little smarter? Even the girl knows basic sorting!

Maybe try and find a **worst case O(n) complexity algorithm**? Oh, stop staring! So here is the magic mantra: **Hoare’s Quick Select **algorithm.

In this algorithm,** we use the technique of Partitioning from Quick Sort** to get what we want. Partitioning basically takes a pivot element at random, puts it in its right place and puts all numbers lesser that the pivot to one side and the rest on the other. *So if we are able to get the “k”th smallest number as the pivot, we are done!** *(as we can then pick the whole partition with k elements as our answer)

1. Start with a random pivot. Iteration number = 1 for the first time.

2. Partition the array. Takes O(n/iteration number) time.

3. If the pivot is placed at the kth position (relative to original array), we are done! Else, first increment Iteration number. Then : If it is placed before the kth position, go to step 2 with the right hand partition as the array. Otherwise go to step 2 with the left partition.

Got the point? This can be a very helpful method to solve problems such as k largest or smallest element, etc.!

Now to **prove time complexity O(n) in worst case**. Since we process just one partition each time (as opposed to processing both partitions like we do in Quick Sort), here is how the effort looks –

**Effort = n + n/2 + n/4 …**

We know that **1/2+1/4+1/8+… = 1**, (can prove it too)

Hence,** Effort = n + n = 2n = O(n)**

Tada! 🙂

## Given N numbers, find pairs with difference K

*Problem Statement :* You have been given N numbers and a value K. Find the number of pairs in this given list of numbers whose difference is exactly equal to K.

This is a very simple one. Trying a brute force over all the elements is not being efficient. So what we’ll do is that we’ll sort the input array and apply a fine-tuned brute force as follows.

Once the array is sorted, for every number at position j, you start checking numbers backward from j to 0 (let us say using variable i, where i<j) and skip to checking for j+1 if the difference of the jth and ith element exceeds K. If you find their difference to be exactly K, you increment your count.

A very simple implementation of sorting. 🙂

## Find if a given array is a sequence of consecutive integers.

*Problem Statement:* You have been given an array of positive integers, obviously, in a random order. You are supposed to find if all elements taken can form a sequence of consecutive integers. (Array might have duplicates, but then it is not correct)

**Sorting** (approach 1) might be the first solution that pops up in everyone’s mind. Well, that would take O(n lg n) time using the best algorithms out there. What if we want to do it in O(n) time?

You will then tell me to find the MIN and MAX of the array, make another array (with default value 0) and increment the corresponding value in this array when you discover such elements. That is your concept of **counting** (approach 2), if you remember. You achieved time complexity O(n) but added space complexity O(n). Now you might struggle to reduce this space complexity.

*EDIT: I’ll try come up with a better solution shortly. (Forget the one that used to be here)*

Have fun!

## Find the only element in an array which does not have a duplicate. (Only occurs once)

While this question might seem eerily similar to the last one, it is vastly different. Here, you have an array in which every element has a duplicate except one element. For example,

2, 2, 3, 4, 5, 9, 5, 4, 9, 1, 3

Here, 1 is your expected answer since it does not have a duplicate. So how would you go about tackling this question? I will point out three approaches :

1. **Sort** the array. It will take **O(n lg n) time** with a sorting method like Quick Sort (or one of the others we discussed previously). Once the array is sorted, you just need to move through the array once and compare every pair of consecutive elements. The first pair that isn’t equal will be your answer. Sorting the earlier example, we get,

1, 2, 2, 3, 3, 4, 4, 5, 5, 9, 9

You can quickly find that 1 occurs only once by this kind of comparison.

2. **Counting** the number of times a number occurs by utilizing another array. But this can only be used when you have an fixed range within which the elements can be. Even if we have such a range given, this way takes **O(n) of both time and space**. Can we do better?

3. Yes, we can do better. The best method will take only **time O(n)** and utilizes** bit manipulation**. All you need to do is move through the array once and XOR each element. In the previous example, we would do,

2^2^3^4^5^9^5^4^9^1^3 (XOR sign – ^)

The result will be 1, i.e., your answer. Why? XOR has the property of setting bit as zero if the two bits being XORed are same. Also, it follows the commutative property. The above expression is equivalent to,

1^2^2^3^3^4^4^5^5^9^9

Hence, all duplicates will give a zero. And zero XORed with your no-duplicates-for-it number will give you the number itself. Tada! 🙂

Nice, eh?!

## Determine if there are any duplicates in an array which can have values from 1 to N

This one seems to be simple. The first approach that jumps up is the shitty O(n^2) method. Ofcourse you are intelligent enough to discard the idea. Moving on, you recall having read an overview of sorting algorithms on allaboutalgorithms. Oh yes! That would work wouldn’t it?

Certainly. One way to solve this problem is to **sort the array in time O(n lg n)** and then traverse through the array once comparing adjacent elements and seeing if they are equal (since duplicates will always end up being together after you sort the data-set).

But what if an interviewer asks you to solve it with an algorithm having less time complexity but gives you the freedom to use some space? You should now make use of the fact that the question specifically says the elements of the array would lie between 1 and N. Remember the **concept of counting** that we used in counting sort? We can use the same here. Define an array of N elements (initially all zero). Then move through your array once and increment the corresponding position of the discovered element in your extra array. This array will finally hold the counts of the number of occurrences of each element between 1 and N in the array. Now all you need to do is move through this new array and see if any position has a value > 1. That would signify that duplicates do exist in the array. In this way, the problem is solved with **time complexity O(n) and space complexity of O(n)**.

Nice! Another new concept learned. Another problem solved. You can move on now! 🙂

## Find a pair (not consecutive) of numbers in an array whose sum is closest to zero.

The problem statement is as defined in the title of the post. Please note that the problem is only interesting when the array has both negative and positive numbers and the numbers are randomly ordered. Now let me guide you through the whole thought process. (Don’t be a dumb person. The thought process is important even though the pseudo code is there at the end. Read through it.)

So tell me, how would you go forward in searching for two numbers (not necessarily consecutive) in an array of positive and negative numbers whose sum is closest to zero? What is the best possible algorithm that comes to your mind?

O(n^2) ? That is lame. Anyone can do that.

Optimized version of O(n^2) but still O(n^2)? Now you are talking like a cornered person who can’t think of a way out.

Can’t you think up of an O(n lg n) algorithm? What was it that we studied in the last post? Sorting. And what is the time complexity of some of those algorithms? Right. You just made the connection and got the hint. (use Sorting algorithms)

*You*: Ok. So I sorted the array, but how does that help me. I still have the same old O(n^2) algorithm.

*Me*: Well, did I ask you to sort it in the old-fashioned way? How about a modified sort?

If you still haven’t figured it out, ** here it is plain and simple**:

You sort the array neglecting the signs on the numbers. For example,

2 -1 6 -3 -4 3 11

becomes, -1 2 -3 3 -4 6 11

** Now,** can you figure it out? Well all you need to do is find the sum of each pair of consecutive elements of the array and find the pair that sums closest to zero (In this case, the sum of -3 and 3). You have your answer! 🙂

**Pseudo Code** :

-> Input array

-> Sort array neglecting sign of numbers

-> Sum up each pair of consecutive numbers in this new array

-> The pair with the sum closest to zero is your answer.

*Why does this work? *It works because you found out the elements that are closest to each other inspite of the signs. Thus, these are the potential pairs which will give the sum closest to zero.

The whole trick was in ignoring the signs and your sorting algorithm did the rest. That is a pretty good example of the application of sorting algorithms. One that is not so apparent but comes to your mind if you know your basics! Hope you took something good from this first applied algorithm post! 🙂

See you soon.

## Sorting Algorithms (Overview)

“Bah! This is what you give to us as your first *useful* post?”, might be what is running through your mind right now. Well, the answer is, yes. Sorting algorithms are one of the most used class of algorithms in every sort of application that you use. While these might be taught well at the graduate level, their applications are not so well explained. Students don’t realize the different kinds of problems that can be solved utilizing such algorithms.

So let me first name the algorithms which are used for sorting and give a short overview for each. In the subsequent posts, I’ll go on to show the applications (the usual suspects, and some new ones) of the sorting algorithms. But you just *have* to know the following methods for any interviews/development process.

1. **Bubble Sort **– With a time complexity of O(n^2), this is the one of the worst algorithms you can use to sort your data. If you plan to use this ever, you aren’t fit to be on this website. The noobs can look up the algorithm and understand it here on this wiki link.

2. **Exchange/Selection Sort –** The complexity is O(n^2) once again and what is said for bubble sort is true for this as well. Once again, for noobs, the wiki link is here. Both this and Bubble sort can be fine tuned a bit, but not to much use.

3. **Insertion Sort – **This algorithm can fall in the same category as the previous two if implemented in the way explained at the start here. However, things can get interesting if this kind of sort is performed on a linked list *while creating it*. But again, such a method will also prove efficient only for small data sets.

4. **Merge Sort – **Hmm. O(n lg n) time complexity. Now we come to interesting things. Here, what we do is to separate the data set into smaller fragments till we are left with all segments of size one. Once this has been done, we start merging each of these fragments in a way such that each of the merged fragments is now sorted. This is known as the* Divide-And-Conquer* method. Since I expect readers to know how to implement all of these sorting algorithms, the ones not knowing the exact working of the algorithm can again refer to the wiki link here.

5. **Heap Sort – **Oh isn’t this just cool! It has a time complexity of O(n lg n) as well. So what we do in heap sort is that we form a heap of all the elements (max or min heap depending on what you want) and remove the root from the heap and re-heap it once again. So what you might be wondering is, what exactly is a heap?! Quite simply put, it is a binary tree with a certain property that every parent is the tree is either greater (in max heap) or lesser (in min heap) than both its children. Also, each new element is added in a balanced manner before the tree is re-heaped. You know what, I’ll give you a great *powerpoint slide* from my professor which explains Heap Sort very well. Here you go. Heap Sort (The great part is when you realize a heap can be easily implemented using an array itself. Remember doing that with binary trees?! 🙂 )

6. **Quick Sort –** With time complexity of O(n lg n), this type of sort is another example of the *Divide-And-Conquer* method that we talked about earlier. The reader might want to note that such a method (DAC) always uses recursion as its *modus operandi*. In this particular method, an element is selected as the pivot (generally, the last element. But variations exist) and this pivot is put in its right place. This ensures that the data-set is now divided into 2 parts (except in the worst case,i.e., data-set already sorted in some order). Now the same technique is applied to the two parts. And so on. Again, here is an awesome *powerpoint slide* to make you understand. Quicksort

7. **Counting Sort –** This is a sort based on the concept of counting how many times a number occurs. It can only be applied when you know the elements which can be present in your data-set. With a time complexity of O(n), this adds a space complexity of the order of the number of possible elements in the data-set. The sort is easily understandable and the concept of counting is better used at other places as will be explained soon in an applied post. 🙂 Noobs – refer the wiki link here for details.

Hah! So we come to the end of an overview of the sorting algorithms. As mentioned in the introductory post, I will dedicate more time on this blog on tackling application problems rather than explaining such simple algorithms in painful detail. That, you can find anywhere on the internet. So hopefully this will be a recap enough for you to understand the application questions of sorting algorithms in the next few posts.

*PS: I skipped 2 ubercool algorithms : Bucket and Radix. I feel these will better be explained with an application and will follow up on these in later posts. 🙂*