A Study and Analysis on Image Processing
Techniques for Historical Document
Preservation

Reka Durai; Dr.V.Thiagarasu

A Study and Analysis on Image Processing Techniques for Historical Document Preservation

Reka Durai¹ and Dr.V.Thiagarasu²

M.Phil Research Scholar, PG & Research, Dept. of Comp. Science, Gobi Arts & Science College(Autonomous), Bharathiar University, Gobichettipalayam, Tamilnadu, India
Associate Professor, PG & Research, Dept. of Comp. Science, Gobi Arts & Science College (Autonomous) , Bharathiar University, Gobichettipalayam, Tamilnadu, India

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

The preserved historical documents are degrading due to bad storage conditions and they are attacked by pests and insects. Digital image processing filtering techniques preserve the historical documents and corrupted historical documents are converted from rgb to gray image. Gaussian, salt and pepper noise are removed by using median, prewitt, wiener, average and laplacian filtering techniques. These filtering techniques are used to clean and enhance the documents. Depends on the noise density filtering algorithms get varied. Based on the noise level, this work compares the filtering techniques and observed that median, average and wiener filter methods perform better than laplacian and prewitt.

Keywords

Image Enhancement techniques, Filters, Noise Removal techniques, Historical Document, Sensing &Acquision

INTRODUCTION

Historical documents are attacked by pests and insects. For this, most documents that are very old enough to just suddenly crumble at slight touch are sealed on tight containers to prevent any insect from being able to get to it at any probable way. Connecting past and present is essential in order for one to find the right path towards future. Tell-tales, written letters and pictures are some of the key items that most people hold on to in order to know about the past.[1] These documents are the sole connection for one to better understand what indeed happened before.

The objective of the analysis issues to historical document preservation. The historical documents suffer from bad storage conditions and poor contrast between foreground and background due to humidity, paper deterioration and ink seeking. Moreover, the fragility of those documents does not allow access to several researchers while a legible digitized version is more accessible. Vast amounts of historical hand written texts are the property of state and country libraries, where these texts will be converted to their digital form to preserve the information in secondary sources even if the primary sources such as ancient scrolls of text get degraded. Changing a scanned grey scale image into a binary image, while retaining the foreground and removing the background is an important step in many image analysis systems including document image processing. The filtering algorithms are implemented for various noise types.[2]

Document image binarization (threshold selection) refers to the conversion of a gray-scale image into a binary image. It is the initial step of most document image analysis and understanding systems. [3] The proposed method has been extensively tested with a variety of historical documents and has demonstrated better.The paper is organized as follows. Section 2 is a brief review of related work. In Section 3 proposed methodology is described and Section 4 filtering algorithm is described in detail while in result and discussion are described in Section 5 and finally, conclusions are drawn in Section 6.

RELATED WORK

In the literature, binarization is performed either globally or locally. For the global methods (global thresholding), a single calculated threshold value is used to classify image pixels into object or background classes, while for the local methods (adaptive thresholding), local area information guides the threshold value for each pixel.[4]

The binarization of images may be a long investigated field with exceptional activities. Some of them have additionally been applied to documents or historical documents. One of the older methods in image binarization is Otsu’s, based on the variance of picture element. Bernsen calculates local thresholds using the neighbours.[6] Niblack uses local mean and standard deviation. Sauvola presents a method specialized on document an image that applies two algorithms in order to calculate a different threshold for each pixel.

PROPOSED SYSTEM

The proposed methodology for degraded and poor quality document image, text preservation and enchancement is fully described in this section and also analysis the techniques for historical documents.

Image Sensing &Acquision

Getting a picture “into” a computer CCD camera. Such a camera has, in place of the usual _lm, an array of photo sites; these are silicon electronic devices whose voltage output is proportional to the intensity of light falling on them.[5] For a camera attached to a computer, information from the photo sites is then output to a suitable storage medium. Generally this is done on hardware, as being much faster and more efficient than software, using a frame-grabbing card. This allows a large number of images to be captured in a very short time in the order of one ten-thousandth of a second each. Fig. 1 shows three principle sensor arrangements used to transform illumination energy into digital images. The idea is simple: incoming energy is transformed into a voltage by the combination of input electrical power and sensor material that is responsive to the particular type of energy being detected.[7] The output voltage waveform is the response of the sensors and a digital quantity is obtained from each sensor by digitizing the response.

When operating with gray-scale images, typically wish to switch intensity values. For example, will need to reverse the black and also the white intensities or may want to make the darks darker and the lights lighter.

An application of intensity transformations is to increase the contrast between certain intensity values so that can pick out things in an image. For instance, the following two images show an image before and after an intensity transformation.[8]

Types of Transformation Functions

• Photographic Negative(using imcomplement)

• Logarithmic transformations(using c*log(1+f))

• gamma transformation (using imadjust)

Removing Noise From Images

Noise in documents is classified based on the criteria if it is dependent on the underlying content or independent of the underlying content. Stray marks, marginal noise, ink blobs and salt-and-pepper noise are independent of size. Such content-dependent noise is comparatively more complex to model, mathematically non-linear and often multiplicative. If noise shows a dependable behavior in terms of these properties, it is called regular noise.[10]

In this research study for two common types of noises like Gaussian, salt and pepper. The following example compares using an averaging filter and median filter to remove salt and pepper noise. Fig 3 shows salt and pepper noise in the original image (Fig 2). This type of noise consists of random pixels being set to black or white. In both cases the size of the neighborhood used for filtering is 3*3.

Gaussian Noise

Gaussian noise also called Random Variation Impulsive Noise (RVIN) or normal noise T is a type of statistical noise in which the amplitude of the noise follows that of a Gaussian distribution.[9] Gaussian Noise occurs as the probability density function of the normal distribution. Thus Gaussian Noise represents the frequency spectrum that has a bell shaped curve.

Gaussian distribution noise can be expressed by:

Where: P(x) is the Gaussian distribution noise in image; μ and s is the mean and standard deviation respectively.

Salt-and-pepper Noise

Salt-and-pepper noise is also called as Fat-tail distributed or impulsive noise or spike noise.[12] An image containing salt-and-pepper noise will have dark pixels in bright regions and bright pixels in dark regions. Salt and pepper noise is predominantly found in digital transmission and storage. It can be described as:

S (t) represents the amount of dark pixels in bright regions, N (t) represents bright pixels in dark regions and I(t) represents the overall salt-and-pepper noise in the given image and e={0,1},with a probability P. There is a clear 50% probability of the occurrence of either black or white pixels within the image giving rise to salt and pepper noise. [11] To simulate the effects of some of the problems listed above, the toolbox provides the impose function, which can use to add various types of noise to an image.

IMAGE FILTERING ALGORITHMS

In image processing, filters are mainly used to suppress either the high frequencies in the image, i.e. smoothing the image, or the low frequencies, i.e. enhancing or detecting edges in the image. Image restoration and enhancement techniques are described in both the spatial domain and frequency domain, i.e. Fourier transforms. Noise removal is easier in the spatial domain as compared to the frequency domain as the spatial domain noise removal requires very less processing time.[13] Spatial processing is classified into point and mask processing. Point processing involves the transformation of individual pixels separately of other pixels in the image. These simple operations are typically used to correct for defects in image acquisition hardware, for example to compensate for under/over exposed images. On the other hand, in mask processing, the pixel with its neighborhood of pixels in a square or circle mask are involved in generating the pixel at (x, y) coordinates in the enhanced image. It is a more costly operation than simple point processing, but more powerful.

The application of a mask to an input image produces an output image of the same size as the input. One of the most significant requirements of noise removal algorithms is that they should provide suitable amount of noise removal and also help preserve the edges.[19] For the stated conditions to be satisfied there are two types of filters with their significant advantages and disadvantages. The two types of filters are the linear and non-linear filters. The linear filters have the advantage of faster processing but the disadvantage of not preserving edges. Conversely the nonlinear filters have the advantage of preserving edges and the disdvantage of slower processing.

Median Filter

It is significant to perform noise removal during signal processing on an image or on a signal. One of the methods to perform this noise reduction is by performing neighborhood averaging. The neighborhood averaging can suppress isolated out-of-range noise, but the side effect is that it also blurs sudden changes such as sharp edges.[14] The median filter is an effective method that can suppress isolated noise without blurring sharp edges. In Median Filtering, all the pixel values are first sorted into numerical order and then replaced with the middle pixel value.

Let y represent a pixel location and w represent a neighborhood centered around location (m, n) in the image, then the working of median filter is given by y [m, n]=median{x[ i ,j],( i , j) belongs to w} Since the pixel y[m ,n] represents the location of the pixel y ,m and n represents the x and y co-ordinates of y. W represents the neighborhood pixels surrounding the pixel position at (m, n).[18] ( i , j) belongs to the same neighborhood centered around (m, n).Thus the median method will take the median of all the pixels within the range of ( i , j) represented by X[i,j]. The result of the median filter is shown in Fig 4.

Wiener Filter

The inverse filtering is a restoration technique for deconvolution, i.e., when the image is blurred by low pass filter, it is possible to recover the image by inverse filtering or generalized inverse filtering. However, inverse filtering is very sensitive to additive noise.[14] The approach of reducing degradation at a time allows us to develop a restoration algorithm for each type of degradation and simply combine them. The Wiener filtering executes an optimal trade-off between inverse filtering and noise smoothing. It removes the additive noise and inverts the blurring simultaneously.[20]

The Wiener filtering is optimal in terms of the mean square error. In other words, it minimizes the overall mean square error in the process of inverse filtering and noise smoothing. The Wiener filtering is a linear calculating of the original image. The approach is based on a stochastic framework. The orthogonality principle implies that the Wiener filter in Fourier domain can be expressed as follows:

Where Sxx(f1,f2),Sηη(f1,f2) are respectively power spectra of the original image and the preservative noise, and is the blurring filter. It is easy to see that the Wiener filter has two part, an inverse filtering part and a noise smoothing part.

It not only performs the deconvolution by inverse filtering (high pass filtering) but also removes the noise with a compression operation (low pass filtering). Fig 5 shows the result of Wiener filter.

Average Filter

Mean filtering is a simple, sensitive and easy to implement method of smoothing images, and to reduce the amount of intensity discrepancy between one pixel and the next. Average filtering replaces each pixel value in an image with the mean value of its neighbors, including itself. The simplest procedure would be to estimate the mask for all the pixels in the image.[15] For all the pixels in the image which fall under this mask, it will be considered as the new pixel. This has the effect of eliminating pixel values which are unrepresentative of their surroundings. Fig 6 shows result of Average filter is also considered to be a convolution filter or a mean filter.

Laplacian Filter

Detecting edges within an image can be done by the laplacian filter. It denotes areas where the intensity changes fast, hence producing an image with all the edges. The Laplacian is often applied to an image that has first been smoothed with something similar to a Gaussian smoothing filter, in order to reduce its sensitivity to noise.[16] The operator normally takes a single gray level image as input and produces another grey level image as output. As radius of attention on the image is increased, this method will prove to be more computationally costly.

Prewitt Filter

The Prewitt filter is edge detection algorithms.[15] Fig 7 shows the result of prewitt filter. prewitt filter is a discrete differentiation operator computing an approximation of the gradient of the image intensity function.

At each point in the image, the result of the Prewitt operator is either the corresponding gradient vector or the average of this vector. The Prewitt operator is based on convolving the image with a small, separable, and integer valued filter in horizontal and vertical direction and is therefore relatively low-cost in terms of computations.[17] On the other hand, the gradient approximation which it produces is comparatively rough, in particular for high frequency variations in the image. The working of Prewitt filter consists of computing the root mean square root of two 3 cross 3 matrices.

RESULT AND DISCUSSION

To analysis the Filtering algorithms, below stated steps are followed:

a) First an uncorrupted document image is taken as input

b) Next the document image is converted to rgb to gray image.

c) Different noises are added to the document image artificially with 10% noise density.

d) The filtering algorithms are applied for reconstruction of document images.

f) To test the performance of the filters for varying noise density, Gaussian noise with different variance is applied on the binary document image.

CONCLUSION AND FUTURE WORK

In this work five filtering algorithms were applied on Salt & Pepper noise which would be developed in a document during image capture, and transmission. From the results it is seen that median, average and wiener filters perform better compared to Laplacian and Prewitt. It is also observed that median filter is better in removing salt and pepper noise. We have applied only five algorithms, in future can apply few algorithms and test and analyze which one is best to remove salt and pepper noise.

Figures at a glance


Figure 1	Figure 2	Figure 3	Figure 4


Figure 5	Figure 6	Figure 7

References

C. Gonzalez, “Digital Image Processing Using MATLAB” Second Edition.

International Journal of Computer Applications (0975 – 888) Volume 48– No.12, June 2012.

Mr. Salem Saleh Al-amri, Dr. N.V. Kalyankar and Dr. Khamitkar S.D , “A comparative study of removal noise from remote sensing image” Publishedby IJCSI International Journal of Computer Science Issues, Vol. 7, Issue. 1, No. 1, January 2010.

MasoudNosrati, RonakKarimi,Mehdi Hariri,” Detecting circular shapes from areal images using median filter and CHT”, Published in Global Jounalof Computer Science and Technology.Volume 12,January 2012.

Dr.G.Padmavathi, Dr.P.Subashini, Mr.M.Muthu Kumar and Suresh Kumar Thakur, “Comparison of filters used for underwater Image-Preprocessing”,IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.1, January 2010.

Tudor Barbu (2013), "Variational Image Denoising Approach with Diffusion Porous Media Flow".

A.Buades, B.Coll, and J.M.Morel, 2006. The staircas-ing effect in neighborhood filters and its solution. IEEE Trans-actions on Image Processing 15,6, 1499–1505.

Mamta Juneja, Parvinder Singh Sandhu, “Design and Development of an Improved Adaptive Median Filtering Method for Impulse Noise Detection”,International Journal of Computer and Electrical Engineering, Vol. 1, No. 5 December, 2009.

Subr, K., Soler, C.,Anddurand, F. 2009. Edge-preserving multiscale image decomposition based on local extrema. ACM Transactions on Graphics(Proc. Siggraph Asia) 28, 5

Robert Fisher, Simon Perkins, Ashley Walker, Erik Wolfart (2013), "Image Synthesis- Noise Generation".

Jayaraman et al. (2009), Digital Image Processing. Tata McGraw Hill Education.

E. Arias-Castro and D.L. Donoho (2009), "Does median filtering truly preserve edges better than linear filtering".

Reuter, M. and Biasotti, S. and Giorgi, D. and Patane, G. and Spagnuolo, M. (2009), "Discrete Laplace-Beltrami operators for shape analysis andsegmentation".

J.M.S. Prewitt "Object Enhancement and Extraction" in "Picture processing and Psychopictorics", Academic Press, 2005.

Thomas Kailath, Ali H. Sayed, and Babak Hassibi, Linear Estimation, Prentice-Hall, NJ, 2000, ISBN 978-0-13-022464-4.

Subhojit Sarker and Swapna Devi, “Effect of Non-Local Means filter in a Homomorphic Framework Cascaded with Bacterial Foraging Optimizationto Eliminate Speckle”, CiiT International Journal of Digital Image Processing, Vol. 3,No. 2 February, 2011.

V.R.Vijaykumar, P.T.Vanathi, P.Kanagasabapathy and D.Ebenezer, “Robust Statistics Based Algorithm to Remove Salt and Pepper Noise inImages”, International Journal of Information and Communication Engineering 5:3 2009.

Z. Ma, H.R. Wu, D. Feng, Partition based vector filtering technique for suppression of noise in digital color images, IEEE Trans. Image Process. 15(2006) 2324–2342.

H. Liu, Y. Guo, G. Zheng, Image denoising based on least squares support vector machines, in: The Sixth World Congress on Intelligent Control andAutomation, WCICA 2006, vol. 1, June 2006, pp. 4180–4184.

S. Li, D. Huang, Image denoising using non-negative sparse coding shrinkage algorithm, in: IEEE Computer Society Conference on Computer Visionand Pattern Recognition, CVPR 2005, vol. 1, June 2005, pp. 1017–1022.