## Find ‘k’ smallest numbers from a million numbers (and take the girl with you!)

Your mum just told the “aunty next door”, the obese mother of your 6-year-crush “girl next door” that you are extremely smart and good in Math. WTF, mom?! And aunty being the crook she is, gives you her shopping list which has 1000 products with their prices and asks you to pick the least-values 10 products. Your crush is right there. If only you were in the Matrix, right?

Scale out the problem to finding the k smallest numbers from a million numbers and throw in a computer to solve the problem with code. What now? How do you go about it?

Well, you can** sort all elements in O(n log n) time** and pick the first k numbers. But can’t you try and be a little smarter? Even the girl knows basic sorting!

Maybe try and find a **worst case O(n) complexity algorithm**? Oh, stop staring! So here is the magic mantra: **Hoare’s Quick Select **algorithm.

In this algorithm,** we use the technique of Partitioning from Quick Sort** to get what we want. Partitioning basically takes a pivot element at random, puts it in its right place and puts all numbers lesser that the pivot to one side and the rest on the other. *So if we are able to get the “k”th smallest number as the pivot, we are done!** *(as we can then pick the whole partition with k elements as our answer)

1. Start with a random pivot. Iteration number = 1 for the first time.

2. Partition the array. Takes O(n/iteration number) time.

3. If the pivot is placed at the kth position (relative to original array), we are done! Else, first increment Iteration number. Then : If it is placed before the kth position, go to step 2 with the right hand partition as the array. Otherwise go to step 2 with the left partition.

Got the point? This can be a very helpful method to solve problems such as k largest or smallest element, etc.!

Now to **prove time complexity O(n) in worst case**. Since we process just one partition each time (as opposed to processing both partitions like we do in Quick Sort), here is how the effort looks –

**Effort = n + n/2 + n/4 …**

We know that **1/2+1/4+1/8+… = 1**, (can prove it too)

Hence,** Effort = n + n = 2n = O(n)**

Tada! 🙂

## Find the element which occurs > n/2 times in an array with n elements

Guess what? The Election Commissioner just called upon you to count the **10 billion votes** and find out the person who has majority votes! You are flabbergasted! You had a weekend of good sleep planned out with snack breaks every 2 hours and now you are stuck with this grunt work! **However,** **the EC ensures you that the one who wins will have more than half the total number of votes!**

If you think of **keeping a counter** for each of the (assume **(5 billion – 1) candidates**) and then go on to find the one with the maximum number of votes by incrementing counts, well you can forget the next 3000 weekends and say goodbye to your sleep.

And if you have the brilliant idea of **sorting the votes in some order** and getting the answer, you might just want to find one of those supercomputer’s access key.

What you should notice is that more than half the votes are for the winner. This can be solved using **Moore’s Voting algorithm in O(n) time with constant space**. Here is what you do.

1. Take two variables,current_candidateandcount(which corresponds to count of current candidate)

2. Set count to 0 initially.

3. Now everytime you find the count to be 0 (even at the beginning), make the current element as thecurrent_candidate.

4. Move forward in the array. If you find element the same as current candidate, increment count. If not, decrement count.

5. If count is 0, go to step 3. Else, move forward in the array

6. After traversing the whole array you’ll be left with the majority vote as thecurrent_candidate.

Bam! Want an example?

**B B C G H B B B T**

*Start with* candidate = (none), count = 0

First is a B, candidate = B, count = 1

moving forward, another B, candidate = B, count = 2

now a C, candidate = B, count = 1

and a G, candidate = (none), count = 0

now an H, candidate = H, count = 1

now a B, candidate = (none), count = 0

and another B, candidate = B, count = 1

and one more B, candidate = B, count = 2

and finally T,** candidate = B**, count = 1

Hence, you arrive on the answer candidate –> B !

Weekend plans to sleep might still pan out, eh? Cheers! 🙂

## Given N numbers, find pairs with difference K

*Problem Statement :* You have been given N numbers and a value K. Find the number of pairs in this given list of numbers whose difference is exactly equal to K.

This is a very simple one. Trying a brute force over all the elements is not being efficient. So what we’ll do is that we’ll sort the input array and apply a fine-tuned brute force as follows.

Once the array is sorted, for every number at position j, you start checking numbers backward from j to 0 (let us say using variable i, where i<j) and skip to checking for j+1 if the difference of the jth and ith element exceeds K. If you find their difference to be exactly K, you increment your count.

A very simple implementation of sorting. 🙂

## Rotated String conundrum

Problem Statement: You have an original string and its rotated version as input. (Eg. Original string = hello, Rotated string = llohe) You need to tell how many places the original string needs to be rotated by to get the rotated string.

You can come up with multiple approaches here, all of which are bullshit and not efficient. I have only been able to find a single method which gives me acceptable efficiency.

Solution: You should append the rotated string at the end of another instance of the rotated string and then find the original string in it. And your answer will be length of string minus the starting position of the found substring (original string)

Eg. llohellohe (appended llohe after llohe 😉 )

So I search fot he original string ‘hello’. This gives me the starting position 3 (index from 0). So rotation = (5-3) = 2

** EDIT: **You can implement this in a better way by indexing the original string as i%string.length where i will range from 0 to twice the string length. You are effectively going twice through the rotated string while searching for the substring. – Contributed by Sanjoy (due credits 😉 )

Pretty easy, eh? 🙂

## Boggle Solver

A facebook recruiter asked me to write the code for this particular problem. While I was able to suggest the algorithm correctly, I messed up my code. So here is the *problem statement*:

**Given a 4×4 array with randomly selected letters in it and a dictionary of valid words, find which words can be made from this array by starting from any of the 16 letters and moving in any direction (the only constraint being that you cannot come to a node twice while making a word)**

The first thing you should see here that this is clearly a Graph problem. The second noticeable aspect is that the maximum length of a word from such an array (with the constraint) can be 16 and the minimum length can be 1.

Since you have to look for all possible combinations starting from every point, Depth First Search comes to mind (why not Breadth First Search? Think about it). What you should do is* apply DFS starting at each of the 16 nodes in the array.* At every step, you add the newly formed word to a data structure meant to store strings (not allowing duplicates).

Once you have this data structure ready with you, all you need to do is search for each word in the dictionary that you have been provided. 🙂

Clear enough?

PS: Please code the DFS in the 4×4 array. It is not as simple as you assume it to be. 🙂

PPS: Does anyone have a better solution?

## Forming Cricket Teams (Balanced Partition problem)

*Problem Statement*: You have been given N players, with a corresponding strength value S for each player (Minimum strength being 0 and maximum being K). You are supposed to divide them into two teams such that the difference in the total strengths of both teams is minimized.

This is an everyday problem that we tackle while forming equally matched teams. *You might try a greedy approach here but in many cases it will fail*. It is actually known as a Balanced Partition problem and utilizes the method we used while solving the integer knapsack (1/0) problem. Great! So you know half of it! But let us give it a quick look.

We defined a term P(i,j) which specified that I would find a subset of elements in the first ‘i’ elements which sum up to ‘j’. Now we can define the term recursively, as,

P(i,j) = maximum {P(i-1,j), P(i-1, j-Si)}

While you can refer to the link given to the earlier post, what this formula specifies is that we either have a subset that sums up to ‘j’ in the first ‘i-1’ elements, or that we have a subset in the first ‘i-1’ elements whose sum when added to the ith value Si gives us ‘j’.

Now that we are done refreshing our memory, we’ll move on to the new problem.

We have now generated a 2D array giving us the possibility of a sum (j) being formed using some (<i) elements at each step. Now we need to divide this into 2 sets Team 1 and Team 2 such that the different of their totals is minimized.

For this, we’ll first find the mean value of the list of N elements. If a subset’s sum hits this particular value, we have found a solution with difference 0. Otherwise, we keep trying to find a solution where the sum of strengths of teams is closest to the mean. For this, we can once again check for all ‘i'<=’Mean’,

Minimum {Mean-i : P(n,i)=1}

where 1 indicates we have such a subset. What we are doing is that we are checking for every value less than the Mean it can be formed from any subset of the N elements. Checking this one by one, we land up with the required value closest to the Mean.

There it is. You have your solution! Divide you teams and game on! 🙂

(Retrieving the values for the subsets is as simple as maintaining back-pointers while creating the 2D array and filling it up.)

Doubts? Ask away!

PS: Need to be spoonfed (euphemism for “Me.Want.Code.”) ? Write it yourself! 😛 If you just can’t, drop me an email and I’ll mail it to you!

## Santa Claus is coming to town. (Integer Knapsack 1/0)

I hope you remember the US Pizza problem we talked about a while back. That wasn’t a 1/0 problem but the one we will discuss now certainly is.

: Santa Claus is coming to your city and needs to figure out the gifts he has to take. Since North Pole is a long way off, he can only carry gifts weighing a total of ‘C’ in his bag. He has a total of ‘N’ gifts to select from. You are supposed to help him select the gifts in a way that would use the capacity to the greatest use, i.e., the total weight of the gifts selected should be closest to C and the most number of gifts should be selected.Problem Statement

This is a dynamic programming problem and you should recognize it as soon as you see that the problem has optimal substructures in the fact that its solution can be built starting from 1 to i gifts. Also, this is a 1/0 knapsack problem since you can either select a gift (1) or leave it behind (0).

Let us define a term

M(i,j) as the optimal way of selecting gifts in a way such that selecting 1..i gifts brings the total weight up to exactly j. So your value of i varies from 1 to N and the value of j varies from 0 to C. Hence, the time complexity of the problem is equal to O(NC). (Basically, M(i,j) in a 2D array stores the number of gifts)At every point, there are

two waysin which the value of M(i,j) can be determined.

1. Check if there was a subset of gifts from number 1 to i-1 which formed a subset with total weight equal to j. You have the value M(i-1,j) as one of your candidate values.

2. Check the M(i-1, j-Wi) value such that adding the ith gifts weight gives us j. (i.e, weight of ith gift is Wi). This value plus 1 (because we pick up this ith gift as well) is another candidate value.

Now you simply have to take the maximum of these two values and place the value at M(i,j). In this way, you end up finding M(n,C) in the generated 2D array which is the required answer that our Santa Claus requires.

Here, you should notice how we built the solution for the problem in a manner similar to what we might do for a balanced partition problem. I will discuss this problem in the next post.

Stay tuned for more Dynamic Programming solutions. 🙂