ISSN ONLINE(2278-8875) PRINT (2320-3765)

All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

IMPLEMENTATION OF VLSI BASED IMAGE COMPRESSION APPROACH ON RECONFIGURABLE COMPUTING SYSTEM - A SURVEY

D. U. Shah1, R. B. Ambaliya2
  1. Assistant Professor, Dept. of Electronics and communication, R.K University, Rajkot, Gujarat, India
  2. Assistant Professor, Dept. of Electronics and communication, R.K University, Rajkot, Gujarat, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Abstract

Image data require huge amounts of disk space and large bandwidths for transmission. Hence, image compression is necessary to reduce the amount of data required to represent a digital image. Therefore efficient technique for image compression is highly pushed to demand. Although, lots of compression techniques are available, but the technique which is faster, memory efficient and simple surely hits the user requirements. In this paper, the image compression, need of compression, its principles, how image data can be compressed, and the image compression techniques are reviewed and discussed. Also, wavelet-based image compression algorithm using Discrete Wavelet Transform (DWT) based on B-spline factorization technique is discussed in detail. Based on the review, some general ideas to choose the best compression algorithm for an image are recommended. Finally, applications and future scopes of image compression techniques are discussed considering its development on FPGA systems.

Keywords

Image compression, Discrete Wavelet Transform, Decomposed lifting algorithm (DLA), Huffman-coding.

INTRODUCTION

The need for an efficient technique for compression of Images is ever increasing because, the raw images need large amounts of disk space seems to be a big disadvantage during transmission and storage. Even though, there are so many compression techniques already present, a better technique which is faster, memory efficient and simple surely suits the requirements of the user [11]. Image compression is the application of data compression on digital images. In fact, the objective is to reduce redundancy of the image data in order to be able to store or transmit data in an efficient form. The best image quality at a given bit-rate (or compression rate) is the main goal of image compression. There are two forms of compression techniques that are lossless and lossy compression techniques [9]. The lossy compression that produces imperceptible differences can be called visually lossless [18, 19]. As lossless image compression focuses on the quality of compressed image, the compression ratio achieved is very low. Hence, one cannot save the resources significantly by using lossless image compression. The image compression technique with compromising resultant image quality, without much notice of the viewer is the lossy image compression. The loss in the image quality is adding to the percentage compression, hence results in saving the resources [18, 19]. Wavelet theory and its application in image compression had been well developed over the past decade [18-21]. The field of wavelets is still sufficiently new and further advancements will continue to be reported in many areas. The wavelet transform is one of the major processing components of image compression [17].
According to, the compression phase based on DWT is mainly divided into three sequential steps: (1) Discrete Wavelet Transform, (2) Quantization, and (3) Entropy Encoding. After pre-processing, each component is independently analysed by an appropriate discrete wavelet transform [12]. Since the emergence of the JPEG 2000, considerable attention has been paid to the development of efficient system architectures of the DWT. FPGA implementations can accelerate DWT by pipelining these operations. Several VLSI architectures based on the DWT [14, 15] have been designed and implemented in order to achieve real-time signal processing [15, 16]. There is much architecture proposed for the implementation of DWT. For the 1-D DWT, the architectures can be categorized into the convolution- based [24], lifting-based [25], [26], and B-spline-based [27]. The first one is to implement two-channel filter banks directly. The second one is to exploit the relationship of low pass and high pass filters for saving multipliers and adders [28], [13]. The third one can reduce the multipliers based on the B-spline factorization [27]. The B-splinebased architectures could provide fewer multipliers while the lifting scheme fails to reduce the complexity.
In wavelet image compression, after applying wavelet, quantization, which is a lossy compression technique achieved by compressing a range of values to a single quantum value is performed. When the number of discrete symbols in a given stream is reduced, the stream becomes more compressible. For example, reducing the number of colors required to represent a digital image makes it possible to reduce its file size. Specific applications include DCT data quantization in JPEG and DWT data quantization in JPEG 2000. After the quantization, the quantized DWT coefficients are converted into sign-magnitude represented prior to entropy coding because of the inherent characteristics of the entropy encoding process. Entropy coding can yield a much shorter image representation on average by using short code words for likely images and longer code words for less likely images [22]. Entropy encoding, which is a lossless form of compression is performed on a particular image for more efficient storage. Either 8 bits or 16 bits are required to store a pixel on a digital image. With efficient entropy encoding, we can use a smaller number of bits to represent a pixel in an image; this results in less memory usage to store or even transmit an image. According to, Karhunen-Loeve theorem enables to pick the best basis thus to minimize the entropy and error, to better represent an image for optimal storage or transmission. Also, Shannon-Fano entropy, Huffman coding, Kolmogorov entropy and arithmetic coding are ones that are used by engineers [23].
This paper is organized as follow :In this document, section I gives introduction to image compression section II is helpful to understand image compression of related work. Section III concludes the paper.

REVIEW OF RELATED WORKS

A handful of low power architectures are presented in the literature for computing the wavelet co-efficient. The proposed methodology is mainly concentrated on the wavelet-based image compression. Here, low power architecture is used to compute the wavelet so that the overall efficiency can be achievable. Initially, here, we have presented some of the low power architecture presented in the literature for wavelet transform that are mostly based on the lifting, distributed arithmetic and spline. C. T. Huang et al. [1] have proposed architecture for Discrete Wavelet Transform (DWT) based on B-spline factorization. The B-spline factorization mainly consists of the B-spline part and the distributed part. The former was proposed to be constructed by use of the direct implementation or Pascal implementation. And the latter was the part introducing multipliers and it was implemented with the Type-I or Type-II polyphase decomposition. Since the degree of the distributed part was designed as small as possible, the proposed architectures could use fewer multipliers than previous arts, but more adders would be required. However, many adders were implemented with smaller area and lower speed because only few adders were on the critical path. Three case studies, including the JPEG2000 default (9, 7) filter, the (6, 10) filter, and the (10, 18) filter, were given to demonstrate the efficiency of the proposed architecture.
On the other hand, Kai Liu et al. [4] have proposed a VLSI architecture that performed the line-based discrete wavelet transform (DWT) using a lifting scheme. The architecture consists of row processors, column processors, an intermediate buffer and a control module. Row processor and Column processor worked as the horizontal and vertical filters respectively. Intermediate buffer was composed of five FIFOs to store temporary results of horizontal filter. Control module scheduled the output order to external memory. Compared with existing ones, the presented architecture parallelizes all levels of wavelet transform to compute multilevel DWT within one image transmission time. Furthermore, Wang Chao and Cao Peng [5] have proposed a Decomposed lifting algorithm (DLA), in which the image data was processed in raster scan manner both in row processor and column processor. Theoretical analysis indicated that the precision of DLA outperformed other lifting-based algorithms in terms of round-off noise and internal word-length. An efficient line-based architecture was designed to perform 2D DWT based on DLA with high performance and low memory by eliminating the implementation of data buffer. For an N*N image, only 4N internal memory is required for 9/7 filter with output latency of 2N clock cycles. Compared with related 2D DWT architectures, the size of on-chip memory and output latency were reduced significantly under the same arithmetic cost, memory bandwidth and timing constraint. Xixin Cao et al. [3] have proposed an efficient and simple architecture for 9/7 Discrete Wavelet Transform based on Distributed Arithmetic. To derive the proposed architecture, they considered the periodicity and symmetry of DWT to optimize the performance and reduced the computational redundancy. The inner product of coefficient matrix of DWT was distributed over the input by careful analysis of input, output and coefficient word lengths. In the coefficient matrix, linear maps were used to assign the necessary computation to processing elements in space domain. Moreover, the proposed architecture has regular data flow, and low control complexity. The result was low hardware complexity DWT processors for 9/7 transform, which allows two times faster clock than the direct implementation. This design was very suitable for image compression systems, e.g., JPEG2000 and MPEG4. Mohsen Amiri Farahani and Mohammad Eshghi [6] have implemented the design of the Discrete Wavelet Packet Transform with efficient hardware acceleration. This design worked based on the word serial pipeline architecture and the parallel filter processing. For accelerating in the Discrete Wavelet Packet Transform, a high-pass filter and a lowpass filter were used concurrently in each level. Using parallel filters makes possible that this design works two times faster than the design introduced in [7]. This architecture was implemented using internal multipliers of the FPGA and results of these implementations for the different filter lengths were presented. This high speed architecture was suitable for on-line applications and can be implemented for the Direct Wavelet Packet Transform with any levels of tree.
C. T. Huang et al. [2] have presented a detailed analysis of very large scale integration (VLSI) architectures for the one-dimensional (1-D) and two-dimensional (2-D) discrete wavelet transform (DWT) in many aspects, and three related architectures were proposed as well. The 1-D DWT and inverse DWT (IDWT) architectures were classified into three categories: convolution-based, lifting based, and B-spline-based. They were discussed in terms of hardware complexity, critical path, and registers. As for the 2-D DWT, the large amount of the frame memory access and the die area occupied by the embedded internal buffer became the most critical issues. The 2-D DWT architectures were categorized and analyzed by different external memory scan methods. The implementation issues of the internal buffer were also discussed, and some real-life experiments were given to show that the area and power for the internal buffer were highly related to memory technology and working frequency, instead of the required memory size only. Besides the analysis, the B-spline-based IDWT architecture and the overlapped stripe-based scan method were also proposed. Last, they proposed a flexible and efficient architecture for a one-level 2-D DWT that exploits many advantages of the presented analysis.
While analyzing the literature, several VLSI implementations are presented for image compression. According to, Kumar Gupta, A. et al. [8] have presented the VLSI design of a Block Coder (BC) system that can process 21 mega pixels per second. For the Bit Plane Coder (BPC), they employed a Concurrent Symbol Processing (CSP) algorithm to process of all 4 sample locations within a stripe-column in a single clock cycle during a pass. The BPC produced on average, 1.21 Context Data (CxD) pairs per clock cycle. In addition, they have designed an Arithmetic Coder (AC) that processed 2 CxDs/clock cycle. To allow for an efficient coupling of the proposed BPC and AC modules, they also proposed architecture for an intermediate buffer. The BC chip implemented on TSMC 0.18 micrometer technology, occupied an area of 1.6 mm2, with an equivalent gate count of 95,000 that included 24576 memory bits. Its high processing throughput was the highest ever reported for a JPEG2000 BC engine capable of handling both normal and causal modes of operation.
C. Hemasundara Rao and M. Madhavi Latha [9] have proposed a hybrid image compression technique based on reversible blockade transform coding. The technique, implemented over regions of interest (ROIs), was based on selection of the coefficients that belong to different transforms, depending on the coefficients was proposed. This method allows: (1) codification of multiple kernals at various degrees of interest, (2) arbitrary shaped spectrum, and (3) flexible adjustment of the compression quality of the image and the background. No standard modification for JPEG2000 decoder was required. The method was applied over different types of images. Results showed a better performance for the selected regions, when image coding methods were employed for the whole set of images. Finally, VLSI implementation of proposed method was shown. It was also shown that the kernel of Hartley and Cosine transform gave the better performance than any other model. Isa Servan Uzun and Abbes Amira [10] have presented field programmable gate array (FPGA) implementation of a non-separable 2-D DBWT architecture which was the heart of the proposed high-definition television (HDTV) compression system. The architecture adopted periodic symmetric extension at the image boundaries, therefore it conforms the JPEG-2000 standard. Hardware implementation results based on a Xilinx Virtex-2000E FPGA chip showed that the processing of 2-D DBWT was performed at 105MHz providing a complete solution for the real-time computation of 2-D DBWT for HDTV compression.

CONCLUSION

Different techniques of wavelet-based image compression algorithm using DWT technique have been studied. It is found that DWT based JPEG-2000 is best computation to improve the compression rate. Also, more proficient techniques are needed to be developed to improve compression rate and ensure low power consumption, should be developed in future, by realizing it on FPGA

References