In this article you can understand the how to build the project on data compression and decompression. In this article the major step’s is give by which you can successfully build the project on data compression and decompression in c programming language.
To make project on data compression and decompression in c programming language follow the following steps which is describe as following
Step 1 frequency table generation:-
Step 1 frequency table generation:-
At first create the frequency table. The input file is opened in read mode in the program and as it is rad a frequency table is generated for the characters present in the file. The frequency table contains alphanumeric characters and their frequencies of occurrences (weights). This is accomplished by reading each character from the file into a link list. The first character is read into a node and the following ones are compared switch it. If a match is found, the frequency of the character is updated. Else, a new node is added to the list to accommodate the new character.
Step 2 Tree generation:-
After generating the frequency table a link list is formed to construct a tree chain. The link list is sorted in the ascending order. To generate the tree, the first two links list is sorted in the ascending order. To generate the tree, the first two links from the link list are taken. Each link can be considered as child node. These two links are then joined to form a parent node whose weight is equal to the weight of the two child node. The two nodes are then removed from the link list and children are replaces with the parent.
The link list is again sorted in ascending order and the aforementioned procedure is followed. This procedure is repeated until the link list has only one node. I.e. all the nodes are used. The last node is the leaf of the tree.
Step 3 Code generation:-
After generation of the tree, binary codes are generated for each character with the help of Huffman’s algorithm. To generate the codes, the tree must be traversed in some order. Here, post order traversal (non-recursion) is used to generate the codes. In doing this, a stack has to be used by push and pop functions. This is not enough to generate the coded. And you’ve to actually push the numbers ‘0’ and ‘1’ in another stack, depending on whether the node is being traversed to left or right, respectively. The program scans the input characters in the tree and the values in the stack are popped, reversed and stored for all the characters present in the file according to Huffman’s algorithm.
The input file is read character by character and then replaced with binary codes. The binary codes are of variable length. They are converted into fixed length codes are replaced with ASCII characters. This is the toughest part of the implementation.
For writing the uncompressed data into the compressed file, one has to write one bit at time. The ‘c’ language does not provide an y direct means of writing the bits into the compressed file. So a buffer is required to collect the bits until the buffer is full. A union is used in the program to put the codes into the buffer. When the buffer is full, the data can be written into the compresses file. The buffer used in the program is a character variable. A counter is used to check whether the buffer has eight bits.
The compression ratio is given by the formula
Compassion ratio =[input file size-compressed file size ]/input file size *100
Step 4 Decompression:-
To decompress the data back to the original from we must pass the compressed file and the frequency table to the decompression program. As in the compression program, we must build the Huffman’s tree. With latest computers the cost of rebuilding the tree is negligible. To develop the tree, we fist need to read the frequency file passed to the decompression program and develop a link list as in the compression program. The binary tree (Huffman’s tree) is then constructed form the sorted link list similar to the compression program. After the tree is ready, the next step is to decode the characters form the codes. Here also the problem of reading bit by bit from the file is solved by the use of ‘bit fields’ and ‘union’. For decoding the characters from the codes, the technique used in the compression program can’t be used because codes for the output program should be of variable lengths. So we use a tree traversal technique. As we read codes from the compressed file, tree traversal is shifted to right or left depending upon the number read from the (0 or 1).