FPGA Implementation of CORDIC Based DHT for Image Processing Applications | Open Access Journals

ISSN ONLINE(2319-8753)PRINT(2347-6710)

FPGA Implementation of CORDIC Based DHT for Image Processing Applications

Shaik Waseem Ahmed1, Sudhakara Reddy.P2
  1. P.G. Student, Department of Electronics and Communication Engineering, SKIT College, Srikalahasti, India
  2. Associate Professor, Department of Electronics and Communication Engineering, SKIT College, Srikalahasti, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Science, Engineering and Technology


Digital image processing(DIP) is the use of computer algorithms to perform image processing on digital images. The basic operation performed by a simple digital camera is, to convert the light energy to electrical energy, then the energy is converted to digital format and a compression algorithm is used to reduce memory requirement for storing the image. This compression algorithm is frequently called for capturing and storing the images. This leads us to develop an efficient compression algorithm which will give the same result as that of the existing algorithms with low power consumption. Image compression is useful as it helps in reduction of the usage of expensive resources, such as memory, or the transmission bandwidth required. But on the downside, compression techniques result in distortion and also additional computational resources are required for compressiondecompression of the medical image data. In this, we proposed DHT based image compression technique. The computational complexity is further reduced by using pipelined CORDIC architecture for computation of sin and cos terms in DHT. The CORDIC based DHT is partially simulated and synthesized by using Xilinx ISE design suite and the same was implemented on targeted FPGA. The hardware requirements and gate delay values are observed and noted down


CORDIC, DHT, FPGA, Image Processing


Compression, the art and science of reducing the amount of data required to represent an image, is one of the most useful and commercially successful technologies in the field of digital image processing. Digital image and video compression is now very essential. Bio-Medical Image Compression would not be feasible unless a high degree of compression is achieved. Compression is useful as it helps in reduction of the usage of expensive resources, such as memory ( hard disks), or the transmission bandwidth required. In today’s Age of competition where everything is reducing its size every minute, the smaller is the better. But on the downside, compression techniques result in distortion and also additional computational resources are required for compression-decompression of the data. Compression ratio (C) is defined as the ratio of the size of compressed data to that of the uncompressed data.


Image processing involves minimizing the size in bytes of a graphics file without degrading the quality of the image to an unacceptable level. The reduction in file size allows more images to be stored in a given amount of disk or memory space. It also reduces the time and bandwidth required for images to be sent over the Internet or downloaded from Web pages.
There are several different ways in which image files can be compressed. For Internet use, the two most common compressed graphic image formats are the JPEG format and the GIF for mat. The JPEG method is more often used for photographs, while the GIF method is commonly used for line art and other images in which geometric shapes are relatively simple.
The steps involved in image compression are as follows:
1. First of all the image is divided into blocks of 8x8 pixel values. These blocks are then fed to the encoder from where we obtain the compressed image.
2. The next step is mapping of the pixel intensity value to another domain. The mapper transforms images into a (usually non- visual) format designed to reduce spatial and
3. Temporal redundancy. It can be done by applying various transforms to the images. Here discrete Hartley transform is applied to the 8x8 blocks.
4. Quantizing the transformed coefficients results in the loss of irrelevant information for the specified purpose.
5. Source coding is the process of encoding information using fewer bits (or other information- bearing units) than an unencoded representation would use, through use of specific encoding schemes.
For retrieving the image back, the steps have to be reversed from the forward process. First the data is decoded using the decoder. Next inverse transform (IDHT) is calculated to get the 8x8 blocks. These blocks are then connected to form the final image. From the reconstructed image pixel values it is clear that some of the high frequency components are preserved. This indicates that the edge property of the image is preserved.


The Hartley transform is an integral transform closely related to the Fourier transform, but which transforms real- valued functions to real-valued functions. It was proposed as an alternative to the Fourier transform by R. V. L. Hartley in 1942. Compared to the Fourier transform, the Hartley transform has the advantages of transforming real functions to real functions and of being its own inverse. The discrete version of the transform, the Discrete Hartley transform, was introduced by R. N. Bracewell in 1983. Discrete Hartley Transform is the real valued transform which gives only real transform coefficients for real input stream. It has the main advantage over DCT of reducing the memory content up to 50% since the inverse transform is identical to the forward transform. Also, it retains the higher frequency components, which restores the detailing of the image. Since it is a real valued function unlike DFT, the computational complexities are also lower than in DFT algorithms [2].


So, for computing 8-point DHT the multiplication with 1/ 2 can be read from a ROM, while a block of pipelined
The figure 5 shows the structure of a processing element which implements one CORDIC iteration the rotation mode and vectoring mode are two schemes for the CORDIC algorithm. In rotation mode, the aim is to rotate the given input vector (�� , ��)�� with a given angle. After n no’s of iterations, ���� is driven to zero and the total accumulated rotation angle is equal to desired angle Parallel pipelined architecture for CORDIC represents a version of the sequential CORDIC algorithm. Instead of reusing the same hardware for all iteration stages, the parallel architecture provides a separate processor for every iteration.
An example of the parallel CORDIC architecture for rotation mode is shown in figure 6. Each of the n processors present in the block performs a specific iteration, and a particular processor always performs the same iteration. All the shifters perform the fixed shift, so that it can be implemented in FPGA.[5] Every processor utilizes a individual arc tan value that can also be hardwired to the input of every angle accumulator in the absence of a state machine which provides simplicity to this type of architecture. The parallel architecture is much faster than the sequential architecture described in the “iterative Word-serial architecture” in figure 6. It takes new input data and puts out the results at every clock cycle, introducing a latency of n clock cycles. The architecture which is used in the design of the DHT is this parallel-pipelined architecture because this architecture which provides high throughput and low power consumption. Then we can apply the above conditions to the DHT equation.
The DHT is given


The real and imaginary parts of the Fourier transform are given by the even and odd parts of the Hartley transform, respectively
Initially the DHT was decomposed in terms of COS and SINE terms by using Euler’s formula, then for the computation of these trigonometric components we use CORDIC processor. For hardware implementation, we developed Verilog code and compiled using ModelSim software. Further simulated and synthesized by using Xilinx ISE design suite version 12.0 and implemented on Spartan 6.0 FPGA. Finally synthesis report and delay report are noted down. From the results it is observed that, the total real time taken for execution is 1.00secs, the total CPU time taken for execution is 0.94 sec. The macro statistics of CORDIC requires only single ROM, one 4x8-bit ROM, 20-adders and subtractions, three 8-bit adders, one 8-bit subtraction, 32 registers, eight 2-bit registers, 24 8-bit registers and 2 – multiplexers. The simulation results are as shown in figure 9.


In the present work, Discrete Hartley Transform for input matrix was implemented in FPGA using VHDL as the synthesis tool. The DHT was also calculated for 8-point input using two algorithms and their effectiveness were discussed, this primarily focuses on image compression with less computation and low power. The simulation results and design summary of DHT was obtained and it was shown that the architecture implemented is an efficient method which uses limited space and time. The hardware utilization is quite optimum and power analysis shows that the power requirement is also optimum. However if the input contents are large, they tend to overflow from the registers and hence error occurs. It can be rectified by saving the transformed coefficients in larger registers. Also due to quantization in the contents of the ROM, even-number outputs are more deviated from the desired results than the odd-numbered outputs.


[1] S.K.Pattanaik and K.K.Mahapatra, “DHT Based JPEG Image Compression Using a Novel Energy Quantization Method” IEEE International conference on industrial technology, pp. 2827 – 2832, Dec 2006.

[2] RN.Bracewell, 0.Buneman, H. Hao and J. Villasenor, “Fast two-dimensional hartley transform” Proceedings of IEEE, Vol 74, No. 9, Sept1986.

[3 ] C. H. Paik and M. D. Fox, “Fast Hartley transform for image processing,” IEEE Trans. Med. Image, vol. 7 , no. 6, pp. 149–153, Jun. 1988

[4 ] P.K.Meher, S. Thambipillai and J.C. Patra, “Scalable and modular memory-based systolic architectures for discrete hartley transform” IEEE Transactions on circuits and systems-I: regular papers, Vol53, pp. 1065-1077, May 2006

[5] A. Amira, “An FPGA based system for discrete hartley transforms. “IEEE publication, pp. 137-140, 2003

[6] R.C. Gonzalez, R.E. Woods, Digital Image Processing, Pearson Education 3rd Edition 2008

[7] CR. Baugh and BA. Wooley, “A two’s complement parallel array multiplication algorithm” IEEE Transactions on computers, Vol C-22, pp. 1045- 1047, Dec 1973

[8] Bracewell, Ronald N. “The Hartley transform” New York: Oxford university press 1986

[9] Ranjan Bose, Information theory coding and Cryptography, Tata McGraw-Hill 2003.

[10] F.Vahid and T. Givargis, Embedded system design: A unified hardware/software introduction, Wiley India (P.) Ltd, 3 ���� edition 2009.

[11] H.S. Hou, “The fast Hartley transform algorithm,” IEEE Transactions on Computers, vol. C-36, no. 2, pp. 147–156, Feb. 1987.

[12] L.W. Chang and S.W. Lee, “Systolic arrays for the discrete Hartley transform,” IEEE Transactions Signal Processing, vol. 39, no. 11, pp. 2411– 2418, Nov. 1991.

[13] Miodrag Popović and Dragutin Šević, "A new look at the comparison of the fast Hartley and Fourier transforms," IEEE Trans. Signal Processing 42 (8), 2178-2182 (1994)