ABSTRACT—— A novel algebraic integer (AI) based multi-" coding of Daubechies-12 2-D wavelet filters having error-free integer-based computation. Digital VLSI architectures employing parallel channels are proposed, physically realized and tested. The multi-encoded AI framework allows a multiplication-free and computationally accurate architect-ure. It also guarantees a noise-free computation throughout the multi-level multi-rate 2-D filtering operation. A single final reconstruction step (FRS) furnishes filtered and down-sampled image outputs in fixed-point, resulting in low levels of quantization noise. Daubechies-12 designs in terms of SNR, PSNR, hardware structure and power consumptions, for different word lengths are compared to Daubechies-12 and -6. SNR and PSNR improvements of approximately 41% were observed in favor of AI-based systems, when compared to 8-bit fixed-point schemes (six fractional bits). Further, FRS designs based on canonical signed digit representation and on expansion factors are proposed. The Daubechies-12 4-level VLSI architectures are prototyped on a Xilinx Virtex-6 xc6vc240t-1ff1156 FPGA device at 282 MHz and 146 MHz, respectively, with dynamic power consumption of 164 mW and 339 mW, respectively, and verified on FPGA chip using an ML605 platform.

KEYWORDS— Algebraic integer encoding, Daubechies wavelets, error-free algorithm, fixed-point scheme, sub-band coding, VLSI.

I. INTRODUCTION

The field of discrete wavelet transforms (DWT) has been attracting substantial interest in part due to the wavelet analysis being capable of decomposing a signal into a particular set of basic functions equipped with good spectral properties [1]-[4]. Wavelet analysis has been used to detect system non-linearities by making use of its localization feature [5]. DWT-based multi-resolution analysis leads to both time and frequency localization [4], [6]-[9].

Indeed, wavelet filter banks establish a strong support for many signal processing systems [10]. Wavelets are employed in numerical analysis [11], [12], real-time processing [11], image compression and reconstruction [3], [13]-[17], pattern recognition [11], biomedicine [12], approximation theory computer graphics [18] and image, video coding standards (H.265) [19], [20], much research effort has been employed on reducing computational and circuit complexities of DWT hardware architectures in VLSI systems [2], [11], [13] [17].

A particular class of DWT is the Daubechies wavelets [24]. They are well-suited and commonly used in image compression applications [3]. Herein we refer to the Daubechies wavelets generated from 12-tap filter banks as Daub-12 wavelets respectively. In particular, whereas the Daub wavelets are often employed in applications where the signals are smooth and slowly varying, the Daub-12 wavelets are used for signals bearing abrupt changes, spikes, and having high un-desired noise levels [11]. Daub-4 wavelets can be highly localized to smooth [2] and Daub-6 wavelets have found applications in medical imaging, such as wireless capsule endoscopy where images of fine details are regarded important [11].

Since wavelets can be associated to specific filter banks, practical wavelet analysis is achieved by means of sub-band coding [24]. Sub-band coding is a basic filtering principle which splits a given signal in several frequency bands for sub-sequent encoding. In particular, 2-D multi-resolution analysis is obtained via sub-band coding [12], [13].

A new multi-encoding technique that achieves exact computation of multi-level 2-DDaubechies wavelet transforms using algebraic integer (AI) encoding. Compared to existing AI designs in literature [1]-[3], [6], [11], [13], [19], the proposed design can compute wavelet image approximations entirely over integer fields and with a single FRS in a purely AI based 2-D architecture. The design avoids the need of intermediate reconstruction steps. Moreover, the proposed architecture is sought to be multiplier-free. Such designs facilitate accuracy, speed, and relatively smaller area on chip as well as cost of design. The new design is multi-encoded and multi-rate, operating over AI with no intermediate reconstruction steps.

In this framework, error-free computations can be performed until the final FRS. Our architecture emphasizes on quality of output image and speed by trading complexity and power consumption for accuracy. The final
reconstruction step (FRS) procedure for the proposed analyses is described. Alternative FRS schemes were also sought for the Daub-12 case.

Field programmable gate array (FPGA) implementation results, hardware resource consumption, and power consumptions are provided in tap-12 filters. Daubechies-12 designs in terms of SNR, PSNR, hardware structure, and power consumptions, for different word lengths are compared to Daubechies-4 and -6. SNR and PSNR improvements of approximately are observed in favours of AI-based systems, when compared to 8-bit fixed-point schemes (six fractional bits).

II. CONTRIBUTIONS

A. The Problem of Fixed-Point Errors

Filter banks associated to Daubechies wavelets have irrational coefficients whose representation in fixed-point requires truncation or rounding off [2], [6]. Such approximations introduce representation errors which propagate through a given filter bank. Moreover, the longer the required filter bank is, the greater the computational error may become. This process effects a lower obtained signal-to-noise of the resulting data.

B. Prior Art on AI-Based DWT

AI encoding can address the computational noise injection in wavelet analysis systems [6]. Pioneered by Cozzens and Finkelstein [3], AI quantization has been employed in several signal processing schemes, including wavelet and discrete cosine transform analysis [1], [11], [13]. A significant advantage of the AI encoding is its capability of mapping the required irrational wavelet coefficients into vectors or arrays of integers. Therefore, wavelet decomposition can be performed without errors in a vectorial framework consisting exclusively of integer operations.

Thus, the irrational coefficients of the Daubechies filters can be represented into integers, according to a selected AI basis [3], [6]. AI encoding schemes require a reconstruction step to convert the resulting AI encoded quantities back into fixed-precision binary. The design of digital architectures for the 1-D Daub-12 filters was pioneered by Wahid and Dimitrov in the recent past. Importantly, the 2-D architectures proposed by Wahid et al. [1]-[3], [6], [13] require intermediate reconstruction steps that map the AI encoded transform coefficients back into fixed-point format.

These are 1-D DWT architectures that compute the 2-D DWT by repeated use of a 1-D AI-encoded architecture. This intermediate reconstruction step is located after the first application of the transform (say, along rows) before submitting the resulting data to the next (say, column-wise) stage. In other words, it is at the transposition stage between the application of the two series of 1-D transform. Such intermediate reconstruction step injects quantization noise and introduces transfer-function response errors. When multi-level decompositions are attempted, the problem is compounded because of repeated applications of the intermediate reconstruction stages at each level of filtering [3], [6], [13], [19].

Errors incurred in the intermediate reconstructions mitigate the benefits of using AI encoding for 2-D multi-level DWTs. This is an outstanding problem in the current literature which we identify and correct in the present contribution. The filters operate column-wise on the image followed by dyadic down-sampling, i.e., only one of every two columns are retained.

C. Proposed Encoding Scheme

Multi-encoding method that possesses error-free computation across the 2-D decomposition levels. In our method, the reconstruction step appears only once, at the final level of decomposition and filtering. Unlike the schemes described in [2], [3], [6], [19], our scheme operates entirely over the AI representation—up to a single and final reconstruction block—without any intermediate reconstruction steps.

Thus, the FRS is the only possible source of computational errors. In view of the above, we propose a new AI-based architecture for sub-band coding of images using 2-D Daub-12 wavelet filters. The AI quantization approach leads to an architecture possessing a parallel channel structure. Input data is successively wavelet decomposed over several levels according to application requirements.

The single FRS employs constant coefficient multipliers based on canonical signed digit (CSD) representation, offering low circuit complexity. This architecture facilitates very low levels of uncorrelated and uncoupled quantization noise in the final decomposed image data.

III. REVIEW OF SUB-BAND CODING

Wavelet decomposition of input image data can be accomplished by sub-band coding. A 2-D finite impulse response (FIR) filter bank processes the input data resulting in an approximation and detail sub-images. The input image \( A_{in} \) is of resolution \( N X N \) pixels; and it is input to a pair of low-pass (approximation) and high-pass (detail) filters and respectively.

The filters operate column-wise on the image followed by dyadic down- sampling, i.e., only one of every two columns are retained. Then the same process is applied row-wise. The outputs are four sub-images \( A_{ah}, Dv, Dd, Dh \) which represent the 2-D wavelet coefficients for the coarse approximation, vertical details, horizontal details, and diagonal details, respectively. Symbols and are used to denote the column-wise and row-wise down-sampling respectively.
The resultant sub-images are all of size N/2 x N/2, because of dyadic down-sampling.

\[
\begin{bmatrix}
01+10+5\sqrt{2}+210 \\
05+10+3\sqrt{5}+210 \\
10-2\sqrt{10}+4\sqrt{5}+210 \\
01+10-5\sqrt{2}+210
\end{bmatrix}
\]

\[
\begin{bmatrix}
01+10+5\sqrt{2}+210 \\
05+10+3\sqrt{5}+210 \\
10-2\sqrt{10}+4\sqrt{5}+210 \\
01+10-5\sqrt{2}+210
\end{bmatrix}
\]

These particular filters possess irrational quantities as shown. These operations can be performed recursively [24]. As a result, after each iteration a coarser approximation can be achieved. Let the original image be analyzed be denoted by A0. The 2-D FIR filter bank based on the Daub-12 filter bank is of particular relevance [2], [14]. Let the low-pass filter associate to these filter banks be denoted as \( h^{(\text{Daub-12})} \) respectively.

**IV. AI-BASED DAUBECHIES-12 SCALING FILTER**

**A. Mathematical Background**

An algebraic integer is a real or complex number that is a root of a monic polynomial with integer coefficients [11],[13]. Algebraic integers can be employed to define encoding mappings which can precisely represent particular irrational numbers by means of usual integers.

Thus, taking apart quantities \( 1/\beta_1 = 4\sqrt{2} \) and \( 1/\beta_2 = 16\sqrt{2} \) as scaling factors, the Daub-6 filter coefficients can be represented as

\[
h^{(\text{Daub-6})} = \begin{bmatrix}
01+\zeta_1+\zeta_2 \\
05+\zeta_1+\zeta_2 \\
10-2\zeta_1+2\zeta_2 \\
10-2\zeta_1-2\zeta_2 \\
05+\zeta_1-3\zeta_2 \\
01+\zeta_1-3\zeta_2
\end{bmatrix}
\]

Thus, taking apart quantities \( 1/\beta_1 = 4\sqrt{2} \) and \( 1/\beta_2 = 16\sqrt{2} \) as scaling factors, the Daub-12 filter coefficients can be represented as, we introduce Daub-12 filter by the combination of Parallel connection of Daub-6 as follows. In a similar fashion, the Daub-12 filter can be put into the AI formalism. Therefore, the only possible source of errors in the proposed architectures for Daub-12 is the multiplication by AI basis elements.

\[
h^{(\text{Daub-12})} = h^{(\text{Daub-6})} + h^{(\text{Daub-6})}
\]

Therefore, these un-normalized low-PASS filters of 12-tap can be split into separate filters given by,

\[
h^{(\text{Daub-6})} = h_1^T + \zeta_1^T h_1^T + \zeta_2^T h_2^T + \zeta_3^T h_3^T + \zeta_4^T h_4^T
\]

Where

\[
\begin{align*}
h_1 &= [1 3 3 1]^T \\
h_2 &= [1 1 -1 -1]^T \\
h_3 &= [1 5 10 5 1]^T \\
h_4 &= [1 1 -2 -2 1]^T \\
h_2^T &= [1 3 2 -2 -3 -1]^T
\end{align*}
\]

Where the superscript \( T \) denotes transposition.

Therefore, the Daub-12 filter bank analysis can be separated into two/three structures. This facilitates a two/four integer channel structure, where the integer coefficient filters \( h_1, h_2, h_3, h_4 \) are considered. All implied computations are necessarily over an integer field. Notice that a usual integer \( m \) can be effortlessly represented in either basis:

\[
m = m_0 \cdot \zeta + m_0 \cdot \zeta_1 + m_0 \cdot \zeta_2 + m_0 \cdot \zeta_3 + m_0 \cdot \zeta_4,
\]

This is relevant for encoding image pixel values, which are integers. In practical terms, this means that no circuitry for encoding integer input data is necessary, AI-based Daub-6 filter bank is shown in Fig1. These filters possess zero initial condition.
B. 2-D Filtering

A mathematical framework to describe the operation of the proposed AI-based multi-level encoding design. The following notation is adopted in this work. Let C be an N x N matrix with columns $c_i$, $j=0$. In other words and $\otimes$ are the filtering operations along the rows and columns of a given image, respectively, followed by a dyadic own-sampling stage. Here, we introduce Daub-12 filter by the combination of Parallel connection of Daub-6 filter as shown in fig. 4. Now, increasing the filter counting means that it will reduce the area occupation as compared to the previous filter structure.

1) AI-Based Daub-6 DWT Decomposition:

In a similar fashion, the Daub-6 filter bank can be put into the AI formalism. Considering Fig.1 we can derive the following expression:

$$\beta_2^2 \cdot A_1 = h^{\text{Daub-6}} \circ h^{\text{Daub-6}} \otimes A_0$$

where $A_0$ is the input image of integer pixel values.

2) AI-Based Daub-12 DWT Decomposition:

In a similar fashion, the Daub-12 filter bank can be put into the AI formalism. Considering Fig.1: value is to be doubled & we can derive the following expression:

$$\beta_2^2 \cdot A_1 = h^{\text{Daub-12}} \circ h^{\text{Daub-12}} \otimes A_0$$

where $A_0$ is the input image of integer pixel values.

Invoking (2) we obtain error free integer operations described above are illustrated in Fig.3 combinational block D is employed in order to furnish AI filter bank.

![Fig. 4 Daub-12 AI Filter Structure.](image)

V. FINAL RECONSTRUCTION STEP

The proposed AI-based wavelet analyses based on Daub-12 filter banks are computed entirely over extended integer fields. However, the resulting AI encoded approximations, and must be converted back to standard fixed-point representation. This is required in order to interface the resulting approximation sub-images with conventional real time systems.

Decoding operations for both Daub-12 consist of explicitly performing the following computations, respectively. Fortunately, the factors $1/\sqrt{2^{10}}$ and $1/\sqrt{2^{12}}$ are always a power of two, which can be conveniently realized with bit-shift operation.

Therefore, the only possible source of errors in the proposed architectures for Daub-12 is the multiplication by AI basis elements.

$$\zeta = \sqrt{3} \approx 1.7320508075688...$$

$$\xi_1 = \sqrt{10} \approx 3.16227766016838...$$

$$\xi_2 = \sqrt{5 + 2\sqrt{10}} \approx 3.36519766437824...$$

$$\xi_{12} = \sqrt{10} \approx 10.6416893961141...$$

We propose two approaches for the FRS design: (i) CSD representation and (ii) expansion factor method.

A. CSD Approximation

The FRS can be directly implemented by approximating the required irrationals into rationals. A possibility is Employing CSD representation as the associate relative errors. CSD encoding requires only bit shifters and adders/subtractor.

B. Expansion Factor Method

Expansion factors are scaling constants usually employed in the design of approximate discrete transforms [17]. In[12], Britanak et al. survey the topic in this context. Recently this methodology was extended and adapted to the design of final reconstruction blocks related to AI based architectures. An expansion factor is simply a constant that simultaneously scales a given set of real numbers into integer values. In practical terms, only approximate integers at a given error tolerance are sought.

<table>
<thead>
<tr>
<th>Word length</th>
<th>CSD Encoding</th>
<th>% Error</th>
</tr>
</thead>
<tbody>
<tr>
<td>8 bit</td>
<td>2-2-2-2-6</td>
<td>1.33</td>
</tr>
<tr>
<td>10 bit</td>
<td>2-2-2-2-2-2</td>
<td>0.021</td>
</tr>
<tr>
<td>12 bit</td>
<td>2-2-2-2-2-2</td>
<td>0.021</td>
</tr>
<tr>
<td>14 bit</td>
<td>2-2-2-2-2-2-2</td>
<td>0.0086</td>
</tr>
<tr>
<td>16 bit</td>
<td>2-2-2-2-2-2-2-2</td>
<td>0.0028</td>
</tr>
</tbody>
</table>

In mathematical terms, we have the following structure. Let the AI elements constitute a vector An expansion factor is a real number that satisfies the following minimization problem (12,p. 274). Where $\| \cdot \|$ returns the Euclidean norm and round ($\cdot$) is the rounding-off function. Resulting integer approximations. Notice that the above expression in parentheses can be evaluated by means of integer arithmetic, which requires simple additions and bit-shift operations in hardware.

As a consequence, only a single non-integer multiplication by is required. Britanak et al. survey the topic in this context. Recently this methodology was extended and adapted to the design of final reconstruction blocks related to AI based architectures [15]. An expansion factor is simply a constant that simultaneously scales a given set of real numbers into integer values. In practical terms, only approximate integers at a given error tolerance.
are sought. In mathematical terms, we have the following structure. Let the AI elements $\xi_1, \xi_2$ and $\xi_1 \xi_2$ constitute a vector $\zeta_1 \xi_2 \xi_2 \xi_2$. An expansion factor is a real number that satisfies the following minimization problem ([12, p. 274])

$$\alpha^* = \arg \min_{\alpha} \| \zeta - \text{round}(\zeta_0) \|_1$$

Where 1 returns the Euclidean norm and is the(.) rounding off function. Resulting integer approximations are a non-linear, unconstrained optimization problem. Its intractability indicates the application of computational search. In this case, we must impose a constraint to the search space. As posed above, (5) is a non-linear, unconstrained optimization problem.

VI. FPGA Implementation and Results

The architectures for Daub-12 filter banks were implemented on Xilinx Virtex xc205vl 100E1156 device using the ML605 evaluation board. The designs were tested with different standard images to the filter banks submitted to the Daub-12 filter banks. Hardware results were verified with MATLAB. FPGA for the Daub-12 filter banks.

The proposed system that operates over fixed-point arithmetic instead of AI-based arithmetic. The compression of input image is shown in Fig 5. For such, we employed 8 bits for word size with 6 fractional bits. In this case, the required filter banks were implemented by quantizing the exact filter coefficients into the fixed-point representation.

Fig. 5 Scale Down Process

Notice that the fixed-point scheme incurs coupled quantization noise, where as the AI-based architectures immune to this source of contamination. Uncorrelated and uncoupled quantization noise; and offer the maximum frequency of operation among others. Since the design is speed optimized using fine-grain pipelining and parallel architectures, it is not anticipated to yield advantages in terms of power and area.

Fig. 6 Daub 12-Scale Up Process

In a sense, we traded the speed (maximum frequency) for power and resources. The new design is multi encoded and multi rate, operating over AI with no intermediate reconstruction steps. Error free computation can be performed until the final FRS. Fig 7 shows the emphasizes on quality of output image and speed by trading complexity.

A. Resource Consumption and Figures of Merit

Table II shows resource consumption for the Daub-6 and -12 filter banks. Monitored resources include: the number of slice registers, the look-up table (LUT) count, and the number of configurable logic blocks (CLB). Critical path delays (CPD), the maximum operating frequency, area-time product (AT), and AT$^2$ were selected as figures of merit.

The AT product is a standard performance metric in digital hardware design. It refers to chip-area and speed of the design. Lower AT values indicate a higher speed of operation. In an FPGA, the area (A) is provided by the number of slice LUTs used for logic given by the FPGA design tool called XFLOW and the time is simply the critical path delay.

B. Comparison With Existing Methods

A significant amount of work is published on 1-D and 2-D DWT VLSI architectures [1]-[3], [6], [13]. In particular, the designs proposed [3], [6] address the
Daub-6 and -12 wavelet analysis. Also detailed data is reported in [3], [6] allowing us to derive meaningful comparisons in Table II. Considering 8-bit input word length, the obtained SNR and PSNR values for proposed architectures were roughly 30–40% higher than the 1-D and 2-D DWT architectures described in [3], [6]. Among the FRS approaches we have mentioned, we used canonical signed digit (CSD) approximation for comparison.

TABLE II

| COMPARISON OF DAUB-6 AND -12 PERFORMANCES |
| Measured For Single Level Decomposition With 8-bit Input Data |

Moreover, we compared the proposed architectures with several prominent VLSI 2-DDWT designs. The proposed architectures are also compared with recently archived in literature. Published AI based DWT architectures. Table II shows the comparison results. Notice that the Daub-12 FRS design based on the expansion factor method could offer a 11% improvement in the clock frequency when compared to the design based on CSD representation.

VII. CONCLUSIONS

The multi-encoded AI-based 2-D wavelet filters bank architecture capable of arbitrarily high numerical accuracy. The introduced design employs AI-based arithmetic which is (i) error-free, (ii) defined over integers, and (iii) free of multiplications. By employing AI encoding, result of wavelet decomposed images had SNR and PSNR figures improved by approximately 37-41% when compared to a counterpart fixed-point system with 8-bit word length and 6 fractional bits. Comparing the paper [1] our proposed Daub-12 architectures. The SNR and PSNR values for the AI-based Daub-12 architecture were approximately 11–13% higher than the figures obtained from the Daub-6 architecture. A single FRS is the only source of computational error.

We proposed several designs for the FRS based on CSD representation and expansion factor scaling. These two methods allowed various configurations of accuracy and tolerable circuit complexities. Standard images were analyzed. FPGA based four-level prototypes for Daubechies 6- and 12-tap wavelet filters are operational at a compilation target frequency of 100MHz on the Xilinx ML605 board. Place-and-route timing analysis furnished 282.50MHz and 146.42MHz for the Daub-6 and -12 architectures, respectively.

Daub-6 and -12 single level decomposition architectures were also FPGA prototyped with the Xilinx Virtex-6 device at 442.47 and 274.72MHz, respectively. The dynamic range of typical imaging applications are also increasing and more emphasis is being made for picture quality. In the presence of higher resolution, increased dynamic range, and increased frame rate, there is no option but to increase the throughput of the digital filtering architectures.

REFERENCES


