Find ‘k’ smallest numbers from a million numbers (and take the girl with you!)

Your mum just told the “aunty next door”, the obese mother of your 6-year-crush “girl next door” that you are extremely smart and good in Math. WTF, mom?! And aunty being the crook she is, gives you her shopping list which has 1000 products with their prices and asks you to pick the least-values 10 products. Your crush is right there. If only you were in the Matrix, right?

Scale out the problem to finding the k smallest numbers from a million numbers and throw in a computer to solve the problem with code. What now? How do you go about it?

Well, you can sort all elements in O(n log n) time and pick the first k numbers. But can’t you try and be a little smarter? Even the girl knows basic sorting!

Maybe try and find a worst case O(n) complexity algorithm? Oh, stop staring! So here is the magic mantra: Hoare’s Quick Select algorithm.

In this algorithm, we use the technique of Partitioning from Quick Sort to get what we want. Partitioning basically takes a pivot element at random, puts it in its right place and puts all numbers lesser that the pivot to one side and the rest on the other. So if we are able to get the “k”th smallest number as the pivot, we are done! (as we can then pick the whole partition with k elements as our answer)

1. Start with a random pivot. Iteration number = 1 for the first time.

2. Partition the array. Takes O(n/iteration number) time.

3. If the pivot is placed at the kth position (relative to original array), we are done! Else, first increment Iteration number. Then : If it is placed before the kth position, go to step 2 with the right hand partition as the array. Otherwise go to step 2 with the left partition.

Got the point? This can be a very helpful method to solve problems such as k largest or smallest element, etc.!

Now to  prove time complexity O(n) in worst case. Since we process just one partition each time (as opposed to processing both partitions like we do in Quick Sort), here is how the effort looks –

Effort = n + n/2 + n/4 …

We know that 1/2+1/4+1/8+… = 1, (can prove it too)

Hence, Effort = n + n = 2n = O(n)

Tada! 🙂

Find the element which occurs > n/2 times in an array with n elements

Guess what? The Election Commissioner just called upon you to count the 10 billion votes and find out the person who has majority votes! You are flabbergasted! You had a weekend of good sleep planned out with snack breaks every 2 hours and now you are stuck with this grunt work! However, the EC ensures you that the one who wins will have more than half the total number of votes!

If you think of keeping a counter for each of the (assume (5 billion – 1) candidates) and then go on to find the one with the maximum number of votes by incrementing counts, well you can forget the next 3000 weekends and say goodbye to your sleep.

And if you have the brilliant idea of sorting the votes in some order and getting the answer, you might just want to find one of those supercomputer’s access key.

What you should notice is that more than half the votes are for the winner. This can be solved using Moore’s Voting algorithm in O(n) time with constant space. Here is what you do.

1. Take two variables, current_candidate and count (which corresponds to count of current candidate)

2. Set count to 0 initially.

3. Now everytime you find the count to be 0 (even at the beginning), make the current element as the current_candidate.

4. Move forward in the array. If you find element the same as current candidate, increment count. If not, decrement count.

5. If count is 0, go to step 3. Else, move forward in the array

6. After traversing the whole array you’ll be left with the majority vote as the current_candidate.

Bam! Want an example?

B B C G H B B B T

Start with candidate = (none), count = 0

First is a B, candidate = B, count = 1

moving forward, another B, candidate = B, count = 2

now a C, candidate = B, count = 1

and a G, candidate = (none), count = 0

now an H, candidate = H, count = 1

now a B, candidate = (none), count = 0

and another  B, candidate = B, count = 1

and one more B, candidate = B, count = 2

and finally T, candidate = B, count = 1

Hence, you arrive on the answer candidate –> B !

Weekend plans to sleep might still pan out, eh? Cheers! 🙂

Generate maximum number from array of digits which is divisible by 2, 3 and 5

Whoa! Blogging 2 questions in one day! I guess I am actually in the mood to write tonight! Lets get started.

Let us take an example array, {9, 6, 3, 4}

So how should we go about solving for this example? Well, you don’t! Because if you notice, apart from 3, we are looking for divisibility by both 2 and 5. Which means that the final number should end in a “0”. And this array does not have a zero!

Moving on, here is the new array, {9, 6, 3, 4, 0}

Now what? Divisibility of 3 means that the sum of digits of the number should be divisible by 3. Note that you don’t have to use up all of the digits. So what do you do?

You can either go for an exponential approach by forming the power set of the array and shortlist the ones which have a sum divisible by 3 and then go on to see which of these has the most number of digits and even then if you have a tie, you’ll go for the one with the biggest digit.. and so on. But wait… I thought we were smart! Then why do all this?!

You can instead do the following –

1. Check if the array has atleast one 0 (zero). If not, we can’t form a number. Go back to bed.

2. Since you are still awake, go on to sort the array in descending order.

3. Now sum up the digits (and keep repeating Steps 3-6 till sum>=3). If your sum%3 == 0, we are done! This sequence is the biggest number. Sleep peacefully.

4. If not, there are 2 possibilities, sum%3 = 1 or 2 (let us call this remainder “R” which can have value 1 or 2)

5. We’ll first try to remove a number from this array which satisfies number%3==R starting from the smallest number in the array. As soon as you find such a number, remove it!

6. If not (life is such), go on to remove  2 numbers starting from the smallest number such that numbers%3==(3-R). Once you have removed 2 such numbers, you might have your desired number! Loop back to Step 3 and check the sum!

7. In the end you might be left with an array with only a 0. In that case your answer is either Aryabhata’s creation or “Cannot be formed” depending on the question’s requirement.

To quickly run through the example, {9, 6, 3, 4, 0}

In descending order, this is, {9, 6, 4, 3, 0} which sums to 22

22%3 = 1

Hence start trying to find a number%3=1 from the smallest,

3%3 = 1? No!

4%3 = 1? Yes!

Remove 4 and we are left with {9,6,3,0} or 9630 –> Which is the panacea for our metaphorical insomnia! Have fun! 🙂

Categories: Arrays, Iteration, Math Tags: ,

Find ‘ceil’ of a key in a Binary Search Tree (BST)

Long long time, eh?! But it has been a busy time at work. 🙂

So lets solve this problem. An example tree follows:

treeNow lets say you have the following queries –

q = 13, Answer = Does Not Exist

q = 5, Answer = 6

How do you go about finding the ceil?

One way to do it is Inorder Traversal of the tree. This will take O(n) time and then get you kicked out of the interview. Did you not read “BST” ? So we should be using that particular property of the tree!

Here is the algorithm you should be following in short

“Start with the root as current node.

If value(current node) == key, you are done!

If value(current node) > key, make answer = value(current node) and current node = leftNode(current node).

Else if value(current node)<key, make current node = rightNode(currentNode).

You have your answer once you exhaust nodes in the path you are moving on this fashion. If no value was assigned to answer, this indicates you kept moving right throughout and the tree has no value >= key.”

That does not sound right? Well, try it out for yourself! And the complexity? O(log n). Whoopie!

Categories: Recursion, Tree Tags: ,

Given N numbers, find pairs with difference K

Problem Statement : You have been given N numbers and a value K. Find the number of pairs in this given list of numbers whose difference is exactly equal to K.

This is a very simple one. Trying a brute force over all the elements is not being efficient. So what we’ll do is that we’ll sort the input array and apply a fine-tuned brute force as follows.

Once the array is sorted, for every number at position j, you start checking numbers backward from j to 0 (let us say using variable i, where i<j) and skip to checking for j+1 if the difference of the jth and ith element exceeds K. If you find their difference to be exactly K, you increment your count.

A very simple implementation of sorting. 🙂

Rotated String conundrum

Problem Statement: You have an original string and its rotated version as input. (Eg. Original string = hello, Rotated string = llohe) You need to tell how many places the original string needs to be rotated by to get the rotated string.

You can come up with multiple approaches here, all of which are bullshit and not efficient. I have only been able to find a single method which gives me acceptable efficiency.

Solution: You should append the rotated string at the end of another instance of the rotated string and then find the original string in it. And your answer will be length of string minus the starting position of the found substring (original string)

Eg. llohellohe (appended llohe after llohe 😉 )

So I search fot he original string ‘hello’. This gives me the starting position 3 (index from 0). So rotation = (5-3) = 2

EDITYou can implement this in a better way by indexing the original string as i%string.length where i will range from 0 to twice the string length. You are effectively going twice through the rotated string while searching for the substring. – Contributed by Sanjoy (due credits 😉 )

Pretty easy, eh? 🙂

Boggle Solver

A facebook recruiter asked me to write the code for this particular problem. While I was able to suggest the algorithm correctly, I messed up my code. So here is the problem statement:

Given a 4×4 array with randomly selected letters in it and a dictionary of valid words, find which words can be made from this array by starting from any of the 16 letters and moving in any direction (the only constraint being that you cannot come to a node twice while making a word)

The first thing you should see here that this is clearly a Graph problem. The second noticeable aspect is that the maximum length of a word from such an array (with the constraint) can be 16 and the minimum length can be 1.

Since you have to look for all possible combinations starting from every point, Depth First Search comes to mind (why not Breadth First Search? Think about it). What you should do is apply DFS starting at each of the 16 nodes in the array. At every step, you add the newly formed word to a data structure meant to store strings (not allowing duplicates).

Once you have this data structure ready with you, all you need to do is search for each word in the dictionary that you have been provided. 🙂

Clear enough?

PS: Please code the DFS in the 4×4 array. It is not as simple as you assume it to be. 🙂

PPS: Does anyone have a better solution?

Tries (Overview)

January 27, 2012 1 comment

What is it? Trie? Did you mean “Tree”? No! I meant “Trie”. It is a data structure (also known as “prefix tree”) that is used in form of a dictionary. It is basically a way of representing data. I don’t think I can explain this without an example. So here we see the way to represent two words in English language using a trie. Shrey and Shayne.

As you can see, a trie gives us words with the same prefixes. In this example, “SH”. So that is how you represent words. You can do the same for numbers (in any format – binary, decimal, etc). Also, note that both have different “Y” nodes because a trie represents words on the base of prefixes. If we associate a count with each node, we can also tell the number of words starting with a particular prefix.

Please note that the insertion in a trie takes O(length) time where length is that of the word being inserted. We can create a trie for english language using a structure with information and 26 pointers for the corresponding letters. Easy, eh?

So a trie can be of many uses. In the next post, we’ll see an example of using Trie. 🙂

PS : For more information on Tries, visit http://en.wikipedia.org/wiki/Trie

PPS : Sorry for the long hiatus in posting!

Forming Cricket Teams (Balanced Partition problem)

November 29, 2011 2 comments

Problem Statement: You have been given N players, with a corresponding strength value S for each player (Minimum strength being 0 and maximum being K). You are supposed to divide them into two teams such that the difference in the total strengths of both teams is minimized.

This is an everyday problem that we tackle while forming equally matched teams. You might try a greedy approach here but in many cases it will fail. It is actually known as a Balanced Partition problem and utilizes the method we used while solving the integer knapsack (1/0) problem. Great! So you know half of it! But let us give it a quick look.

We defined a term P(i,j) which specified that I would find a subset of elements in the first ‘i’ elements which sum up to ‘j’. Now we can define the term recursively, as,

P(i,j) = maximum {P(i-1,j), P(i-1, j-Si)} 

While you can refer to the link given to the earlier post, what this formula specifies is that we either have a subset that sums up to ‘j’ in the first ‘i-1’ elements, or that we have a subset in the first ‘i-1’ elements whose sum when added to the ith value Si gives us ‘j’.

Now that we are done refreshing our memory, we’ll move on to the new problem.

We have now generated a 2D array giving us the possibility of a sum (j) being formed using some (<i) elements at each step. Now we need to divide this into 2 sets Team 1 and Team 2 such that the different of their totals is minimized.

For this, we’ll first find the mean value of the list of N elements. If a subset’s sum hits this particular value, we have found a solution with difference 0. Otherwise, we keep trying to find a solution where the sum of strengths of teams is closest to the mean. For this, we can once again check for all ‘i'<=’Mean’,

Minimum {Mean-i : P(n,i)=1}

where 1 indicates we have such a subset. What we are doing is that we are checking for every value less than the Mean it can be formed from any subset of the N elements. Checking this one by one, we land up with the required value closest to the Mean.

There it is. You have your solution! Divide you teams and game on! 🙂

(Retrieving the values for the subsets is as simple as maintaining back-pointers while creating the 2D array and filling it up.)

Doubts? Ask away!

PS: Need to be spoonfed (euphemism for “Me.Want.Code.”) ? Write it yourself! 😛 If you just can’t, drop me an email and I’ll mail it to you!

Santa Claus is coming to town. (Integer Knapsack 1/0)

November 22, 2011 1 comment

I hope you remember the US Pizza problem we talked about a while back. That wasn’t a 1/0 problem but the one we will discuss now certainly is.

Problem Statement: Santa Claus is coming to your city and needs to figure out the gifts he has to take. Since North Pole is a long way off, he can only carry gifts weighing a total of ‘C’ in his bag. He has a total of ‘N’ gifts to select from. You are supposed to help him select the gifts in a way that would use the capacity to the greatest use, i.e., the total weight of the gifts selected should be closest to C and the most number of gifts should be selected.

This is a dynamic programming problem and you should recognize it as soon as you see that the problem has optimal substructures in the fact that its solution can be built starting from 1 to i gifts. Also, this is a 1/0 knapsack problem since you can either select a gift (1) or leave it behind (0).

Let us define a term M(i,j) as the optimal way of selecting gifts in a way such that selecting 1..i gifts brings the total weight up to exactly j. So your value of i varies from 1 to N and the value of j varies from 0 to C. Hence, the time complexity of the problem is equal to O(NC). (Basically, M(i,j) in a 2D array stores the number of gifts)

At every point, there are two ways in which the value of M(i,j) can be determined.

1. Check if there was a subset of gifts from number 1 to i-1 which formed a subset with total weight equal to j. You have the value M(i-1,j) as one of your candidate values.

2. Check the M(i-1, j-Wi) value such that adding the ith gifts weight gives us j. (i.e, weight of ith gift is Wi). This value plus 1 (because we pick up this ith gift as well) is another candidate value.

Now you simply have to take the maximum of these two values and place the value at M(i,j). In this way, you end up finding M(n,C) in the generated 2D array which is the required answer that our Santa Claus requires.

Here, you should notice how we built the solution for the problem in a manner similar to what we might do for a balanced partition problem. I will discuss this problem in the next post.

Stay tuned for more Dynamic Programming solutions. 🙂