All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Implementation of Low-Complexity Multiplier using distributed arithmetic algorithm

J.Bamela Mary1, K.Ramamoorthy2
  1. PG Scholar (M.E., VLSI Design), PSNA college of engineering and technology, Dindigul, Tamilnadu, India
  2. Associate Professor, Dept. of ECE, PSNA college of engineering and technology, Dindigul, Tamilnadu, India
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Science, Engineering and Technology


For efficient hardware implementation many designers designs several multiplier structure based on different techniques. But these designs are achieving only 30% of power reduction and 28% of area reduction. In this paper we propose a low complexity and low latency multiplier in order to reduce the requirement of power and area. The proposed work is fully based on the distributed arithmetic algorithm (DAA) which provides the better performance than the existing designs. The proposed design will be coded in verilogHDL and synthesized in Xilinx ISE9.2i. From the synthesized result we will prove the modified structure that requires less area and less power than the existing ones. Finally the proposed design will be implemented on FPGA spartan3E hardware.

Index Terms

All-one polynomial, finite field, systolic design.


Finite field multipliers over GF(2^m) have wide applications in elliptic curve cryptography (ECC) and error control coding systems . Polynomial basis multipliers are popularly used because they are relatively simple to design, and offer scalability for the fields of higher orders. Efficient hardware design for polynomial-based multiplication is therefore important for real-time application.All-one polynomial (AOP) is one of the classes of polynomials considered suitable to be used as irreducible polynomial for efficient implementation of finite field multiplication. Multipliers for the AOPbased binary fields are simple and regular, and therefore, a number of works have been explored on its efficient realization. Irreducible AOPs are not abundant. They are very often not preferred in cryptosystems for security reasons, and one has to make careful choice of the field order to use irreducible AOPs for cryptographic applications. The AOP-based multipliers can be used for the nearly AOP (NAOP) which could be used for efficient realization of ECC systems. AOP-based fields could also be used for efficient implementation of Reed- Solomon encoders.Besides, the AOP-based architectures can be used as a kernel circuit for field exponentiation, inversion, and division architectures. Systolic design is a preferred type of specialized hardware solution due to its high-level of pipeline ability, local connectivity and many other advantageous features. In a bit-parallel AOP-based systolic multiplier has been suggested by Lee et al. In a recent paper a low-complexity bit-parallel systolic Montgomery multiplier has been suggested. Very recently an efficient digit-serial systolic Montgomery multiplier for AOP-based binary extension field is presented. The systolic structures for field multiplication have two major issues.
First, the registers in the systolic structures usually consume large area and power. Second, the systolic structures usually have a latency of nearly n cycles, which is very often undesired for real-time applications. Therefore, in this paper, we have presented a novel register- sharing technique to reduce the register requirement in the systolic structure. The proposed algorithm not only facilitates sharing of registers by the neighboring PEs to reduce the register complexity but also helps reducing the latency. Cut-set retiming allows to introduce certain number of delays on all the edges in one direction of any cut-set of a signal flow-graph (SFG) by removing equal number of delays on all the edges in the reverse direction of the same cut-set . When all the edges are in a single direction, one can introduce any desired number of delays on all the edges of any cut-set of an SFG. Therefore, this technique is highly useful for pipelining digital circuits to reduce the critical path. In this paper, we have proposed a novel cut-set retiming approach to reduce the clock-period. The proposed structure is found to involve significantly less area-time-power complexity compared with the existing designs.


In fact, real-time signals may also be processed in this manner if the associated blockprocessing delay is acceptable. Another potentially important application for backward filtering is the implementation of Mallat two-channel iterated filter banks based on power-complementary Butterworth filters (wavelets). The zero-phase case is often used to implement frequency-selective infinite-impulse response (IIR) filters corresponding to the squaredmagnitude of the classical Butterworth, Chebyshev, and elliptic designs. However, other interesting and potentially important applications exist for non causal IIR filters that are not zero-phase. Examples include equalizers for non minimum- phase systems, non causal speech models, half-sample interpolators, and 90-degree phase shifters such as Hilbert transformers and differentiators. On the other hand, many fast algorithms in the context of digital filtering have been obtained based on particular matrix structures. Many approaches to block digital filters (BDFs) design exist. Some approaches compel the BDF to be time-invariant so that conventional filter synthesis techniques can be used. The best known and most widely used approach is Overlap-save. In some other approaches, no such constraint on the BDF is imposed so that the BDF can be time variant.


Let A,B and C are the extended polynomials and these are represented as:
Fig(3.1) General Signal Flow Graph
Architecture for 1 *N convolution and 1-bit input:
Most 2-D convolution implementations rely on the multiplying units embedded in modern FPGAs to carry out all the multiplications in parallel and to achieve great performance. However, as the kernel size increases, the number of embedded multipliers needed grows exponentially. This fact can constrain the kernel size or force to use a bigger FPGA device, which, in its turn, can yield a very high cost per operation ratio. On the other hand, a lot of work has been done on the design of multiplier less filters, mostly in the one-dimensional domain, and some authors have implemented 2-D convolution by replacing multiplications with shifting and adding operations or transforming the computation into the logarithmic domain.


Efficient systolic design for the multiplication over GF(2^m) based on irreducible AOP is obtained in my existing system. By using cut-set retiming technique the critical path is reduced to one XOR gate delay and by sharing of registers for the input-operands in the PEs, the low-latency bitparallel systolic multiplier have been derived. For self checking I have simulated in modelsim. For evaluating the performance parameter I have used Xilinx ISE 9.2i. In my existing system the total power consumption is 81mW, latency is 11.771ns and required number of gates are around 650. Moreover the existing design will be reconstructed for reduce the latency, power requirement and gate count by using distributed arithmetic algorithm.


  1. M. Ciet, J. J. Quisquater, and F. Sica, “A secure family of composite finite fields suitable for fast implementation of elliptic curve cryptography,” in Proc. Int. Conf. Cryptol. India, 2001, pp. 108–116.
  2. H. Fan and M. A. Hasan, “Relationship between Montgomery and shifted polynomial basis multiplication algorithms,” IEEE Trans. Computers, vol. 55, no. 9, pp. 1202–1206, Sep. 2006.
  3. C.-L.Wang and J-L. Lin, “Systolic array implementation of multipliers for finite fields ” IEEE Trans. Circuits Syst., vol. 38, no. 7, pp. 796–800, Jul. 1991.
  4. B. Sunar and C. K. Koc, “Mastrovito multiplier for all trinomials,” IEEE Trans. Comput., vol. 48, no. 5, pp. 522–527, May 1999.
  5. C. H. Kim, C.-P. Hong, and S. Kwon, “A digit-serial multiplier for finite field IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 4, pp. 476–483, 2005.
  6. C. Paar, “Low complexity parallel multipliers for Galois fields based on special types of primitive polynomials,” in Proc. IEEE Int. Symp. Inform. Theory, 1994, p. 98.
  7. H. Wu, “Bit-parallel polynomial basis multiplier for new classes of finite fields,” IEEE Trans. Computers, vol. 57, no. 8, pp. 1023–1031, Aug. 2008.
  8. S. Fenn, M.G. Parker,M. Benaissa, and D. Taylor, “Bit-serial multiplication in using all-one polynomials,” IEE Proc. Com. Digit. Tech., vol. 144, no. 6, pp. 391–393, 1997.
  9. K.-Y. Chang, D. Hong, and H.-S. Cho, “Low complexity bitparallel multiplier for ____ _ defined by all-one polynomials using redundant representation,” IEEE Trans. Computers, vol. 54, no. 12, pp. 1628–1629, Dec. 2005.
  10. H.-S. Kim and S.-W. Lee, “LFSR multipliers over defined by all-one polynomial,” Integr., VLSI J., vol. 40, no. 4, pp. 571– 578,2007.
  11. P. K. Meher, Y. Ha, and C.-Y. Lee, “An optimized design of serial-par- allel finite field multiplier for based on all-one polynomials,”in Proc. ASP-DAC, 2009, pp. 210–215.
  12. M. Sandoval, M. F. Uribe, and C. Kitsos, “Bit-serial and digit-serial montgomery multipliers using linear feedback shift regis- ters,” IET Comput. Digit. Tech., vol. 5, no. 2, pp. 86–94, 2011.
  13. C.-Y. Lee, E.-H. Lu, and J.-Y. Lee, “Bit-parallel systolic multipliers for ields defined by all-one and equally spaced polynomials,”IEEE Trans. Computers, vol. 50, no. 6, pp. 385–393, May 2001.
  14. Y.-R. Ting, E.-H. Lu, and Y.-C. Lu, “Ringed bit-parallel systolic mul- tipliers over a class of fields ,” Integr., VLSI J., vol. 38, no. 4,pp. 571–578, 2005.
  15. C.-Y. Lee, J.-S. Horng, I.-C. Jou, and E.-H. Lu, “Lowcomplexity bit-parallel systolic montgomery multipliers for special classes of,” IEEE Trans. Computers, vol. 54, no. 9, pp. 1061–1070,.