UniAlgo
How Huffman Coding works?

Suppose there is a problem in which you have wood pieces of different length (a1,a2,a3,...,an) and you want to merge to make a new stick. But there is a constraint, that is in each operation the cost of merging is the sum of lengths of 2 sticks and you need to minimize to overall cost of merging.

So we need an to optimally take the sticks according to their lengths. Suppose we are given a=1, a2=4, a3=9, a4=11. Now, let's take a2 and a3 and merge them into one so their sum = 4+9=13 , and cost = 13. Then let us take a1 , since previous sticks were merged to form a stick of length 13, let us take the previous stick and a1 = 1, so new sum = 1+13=14, and cost += 14 means cost = 13 + 14 = 27. Then we are left with a4 so new stick length sum = 14+11 = 25. And total cost = becomes 27 + 25 = 52.

But instead if we had taken first a1 and a2 , then a1+a2=5, then cost = 5. Then going in sorted order we sum the next two smallest length which are currently 5 and 9, so length = 14, and cost = 5+15 = 19. Now we are left with two sticks of length 14 and 11 , so summing them the length becomes 25 and cost = 25 + 19 = 44.

So, if we observe that the greater the number we take the earlier the cost increases. Because every time we add new length to current length, the big element would repeat every time. So this is the intuition behind taking the smallest elements first which ensures that we are minimizing the immediate cost of each merge operation.

Now this is actually the concept behind actual Huffman Coding which is a way to compress a string into a segment of 0s and 1s and of the smallest length so that there would be less chance of compressed string to exceed the integer or long range.

Can you find how? A hint is to find the bits representation for each unique character, and note none of the character's bit representation should be a prefix of another !

Written by: UniAlgo Owner
Frequently Asked Questions
Comments